A 4x increase in text generation speed defines DiffusionGemma, a new model from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based process to generate tokens in parallel. This approach reduces latency for high-throughput tasks. Practitioners can now deploy faster inference without sacrificing the quality of the output.