A 4x increase in text generation speed defines DiffusionGemma, a new approach from Google DeepMind. It replaces traditional autoregressive sampling with a diffusion-based process to predict tokens. This method reduces latency during inference. Developers can now generate high-quality text sequences faster without sacrificing the linguistic coherence of the original Gemma models.