The Defensibility Index replaces simple human-label agreement to evaluate rule-governed AI. Current metrics penalize valid decisions that happen to differ from a single human label, creating an "Agreement Trap." This research introduces Probabilistic Defensibility Signals to measure reasoning stability. Practitioners can now distinguish between genuine model errors and policy ambiguity.