A small auxiliary model now suggests multiple tokens simultaneously to accelerate Gemma 4 text generation by 3x. The main model verifies these suggestions in a single pass. This multi-token prediction approach reduces latency for open-model deployments. Developers can now achieve faster inference without sacrificing the accuracy of the Google model family.