Self-consistency among multiple reasoning paths serves as the proxy for thinking necessity in Apple's new research. The team identified how to allocate compute budgets based on query complexity. This approach prevents wasteful processing on simple tasks. Practitioners can now implement more compute-optimal inference by dynamically adjusting latent space thinking budgets.