Google researchers used frozen Multi-Token Prediction to accelerate Gemini Nano on Pixel devices. This technique allows the model to predict multiple future tokens simultaneously, reducing inference latency. The approach maintains performance while lowering compute overhead. Developers can now deploy more complex on-device LLM workflows without sacrificing the responsiveness of mobile user interfaces.