Apple researchers introduced Stochastic KV Routing to reduce the memory footprint of transformer language models. The method optimizes the depth dimension by sharing caches across layers rather than relying on temporal compression. This approach cuts serving costs without sacrificing performance. Practitioners can now implement more efficient autoregressive generation in memory-constrained environments.