A new Apple research paper introduces Stochastic KV Routing to reduce transformer memory footprints. The method optimizes the depth dimension by sharing caches across layers instead of relying on temporal compression. This approach lowers serving costs for large models. Practitioners can now maintain high throughput without the linear memory growth typical of standard KV caches.