A new architecture from Nemotron-Labs replaces traditional autoregressive sampling with diffusion processes for text generation. This approach enables parallel token prediction, drastically reducing latency compared to standard LLMs. The researchers demonstrate that diffusion models can match quality while increasing throughput. Practitioners can now explore non-linear generation paths to optimize inference speed.