Capability evaluations often accelerate the very research they aim to monitor. The AI Alignment Forum argues that measuring what a model can do creates dangerous externalities. Researchers should instead prioritize behavioral evaluations to identify risks. This shift helps safety practitioners detect hazardous tendencies before models reach critical autonomy thresholds.