A new study on arXiv identifies three distinct causes of annotator disagreement: operational failures, policy ambiguity, and value pluralism. Researchers use interpretability tools to distinguish these sources rather than relying on self-reporting. This allows developers to target specific fixes, such as updating policy wording or improving quality control, to refine safety guardrails.