Inconsistent methodologies and hidden internal data currently render most AI benchmarks incomparable. To fix this, proponents argue for shifting measurement power from AI labs to third-party auditors. This mirrors safety standards in other high-stakes industries. Independent oversight ensures that safety frameworks and release decisions rely on transparent, standardized data rather than company-controlled metrics.