Capability evaluations primarily measure how well AI performs coding or scientific tasks. While these metrics help forecast risk timelines, they inadvertently accelerate the development of dangerous capabilities. AI Alignment Forum contributors argue for a shift toward evaluating specific model behaviors. This pivot aims to decouple safety monitoring from capability research.