Capability benchmarks often accelerate the very risks they intend to monitor. A new proposal on the AI Alignment Forum argues for prioritizing behavioral evaluations over raw performance metrics. This shift helps researchers identify dangerous tendencies before they manifest as high-level skills. Practitioners must now distinguish between what a model can do and how it behaves.