A 4x increase in text generation speed defines DiffusionGemma, a new model from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based approach to produce tokens in parallel. This shift reduces latency for high-throughput applications. Practitioners can now generate complex sequences without the linear time penalty of standard LLMs.