Current AI evaluations prioritize capabilities like coding and scientific reasoning over behavioral safety. The AI Alignment Forum argues that capability-centric metrics inadvertently accelerate risky research. Shifting focus toward behavioral evaluation helps forecast specific risks. Practitioners must prioritize safety-oriented benchmarks to prevent dangerous autonomy in future LLMs.