Capability evaluations often accelerate the very research they intend to monitor. The AI Alignment Forum argues that focusing on model behaviors provides a safer alternative to measuring raw performance. This shift helps researchers forecast risks without inadvertently speeding up development. Practitioners should prioritize behavioral metrics to avoid creating dangerous feedback loops in safety testing.