Untargeted training, such as teaching a model to speak like a pirate, often accidentally fixes misaligned behaviors in model organisms. This fragility renders many current safety benchmarks useless for developing robust alignment techniques. Researchers must now build more resilient organisms to ensure safety interventions actually work on future, more sophisticated misaligned systems.