Researchers at Berkeley AI Research developed Adaptive Parallel Reasoning to optimize inference scaling. The system dynamically allocates compute by processing multiple reasoning paths simultaneously rather than sequentially. This approach reduces latency for complex queries. Practitioners can now balance accuracy and speed more precisely by adjusting the parallelization factor based on task difficulty.