The Trellis framework now incorporates RadixAttention to optimize key-value cache management. This integration reduces memory overhead during inference by reusing previously computed attention keys. It improves throughput for long-sequence generation. Developers can now deploy these models with lower VRAM requirements without sacrificing output quality.