A new Nemotron-Labs research paper proposes using diffusion models to generate text instead of traditional autoregressive methods. This approach enables parallel token generation, drastically reducing latency for long sequences. While the results show promise in speed, the model still struggles with precise coherence. Practitioners can now explore non-linear sampling to bypass token-by-token bottlenecks.