The Defensibility Index replaces simple human-label agreement to evaluate rule-governed AI. Standard metrics often penalize logically consistent decisions, creating an "Agreement Trap" that mislabels ambiguity as error. Researchers now use token logprobs to estimate reasoning stability. This shift allows practitioners to distinguish between actual model failures and valid policy interpretations.