Self-consistency among multiple reasoning paths serves as a proxy for thinking necessity in Apple's latest research. The team identifies when models should use latent space reasoning versus direct generation to optimize compute. This approach prevents wasteful processing on simple queries. Practitioners can now better align inference costs with query complexity for efficient deployment.