The Defensibility Index replaces simple human-agreement metrics to better evaluate rule-governed AI. Researchers argue that traditional labels penalize logically consistent decisions, creating an "Agreement Trap." The team introduces the Probabilistic Defensibility Signal to estimate reasoning stability using token logprobs. This framework allows practitioners to distinguish between genuine model errors and policy ambiguity.