A new arXiv paper uses interpretability to distinguish why AI data annotators disagree on safety policies. The researchers separate operational failures from policy ambiguity and value pluralism. This distinction allows developers to target quality control or policy clarification specifically. It reduces the reliance on costly, manual reasoning surveys for model alignment.