Researchers at BAIR developed Adaptive Parallel Reasoning to optimize inference scaling. The method dynamically allocates compute by processing multiple reasoning paths in parallel based on task difficulty. This approach reduces latency compared to linear chain-of-thought processing. Developers can now scale model intelligence without a proportional increase in per-token generation time.