The Defensibility Index replaces simple human-label agreement to evaluate rule-governed AI. Current metrics penalize logically consistent decisions that diverge from a single human label, creating an "Agreement Trap." Researchers now use token logprobs to estimate reasoning stability. This allows developers to distinguish between genuine model errors and valid policy interpretations.