A 4x increase in text generation speed defines DiffusionGemma, a new model from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based approach to produce tokens in parallel. This architecture reduces latency for high-throughput tasks. Developers can now generate complex sequences faster without sacrificing the quality of the underlying language model.