Self-consistency among reasoning paths serves as the primary proxy for determining when an LLM needs more thinking time. Researchers at Apple developed a method to allocate compute budgets based on query complexity. This prevents wasting tokens on simple tasks. Practitioners can now implement more compute-optimal inference without sacrificing accuracy on complex reasoning.