A 4x increase in text generation speed defines DiffusionGemma, a new model from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based approach to predict tokens. This shift reduces latency during inference. Developers can now generate high-quality text sequences significantly faster than with standard LLM architectures.