Self-consistency among reasoning paths now serves as a proxy for thinking necessity in LLMs. This method identifies when a model needs intermediate chain-of-thought reasoning versus direct generation. Apple researchers use this to allocate compute-optimal inference budgets. Practitioners can now reduce latency by skipping unnecessary latent space processing for simple queries.