Google researchers used frozen Multi-Token Prediction to accelerate Gemini Nano on Pixel devices. This technique reduces computational overhead during inference by predicting multiple future tokens simultaneously. The approach maintains model accuracy while lowering latency. Developers can now deploy more responsive on-device LLMs without sacrificing the quality of generated text.