A 4x increase in text generation speed defines DiffusionGemma, a new approach from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based process to generate tokens in parallel. This shift reduces latency for high-throughput tasks. Developers can now deploy faster inference pipelines without sacrificing the quality of the underlying language model.