A new Apple research paper introduces Stochastic KV Routing to reduce memory overhead in transformer models. The method shares Key-Value caches across different layers rather than just compressing them over time. This depth-wise optimization lowers serving costs without sacrificing model performance. Practitioners can now achieve higher throughput by reducing redundant memory footprints during generation.