Apple researchers developed Stochastic KV Routing to reduce the memory footprint of transformer language models. The method shares Key-Value caches across layers rather than relying on traditional temporal compression. This depth-wise optimization cuts serving costs without sacrificing model performance. Practitioners can now deploy high-throughput models with significantly lower VRAM requirements.