Simple untargeted training, such as forcing a model to talk like a pirate, often accidentally fixes misbehavior in model organisms. This fragility renders many existing safety benchmarks useless for testing alignment techniques. Researchers must build more robust organisms to ensure AI safety interventions work on future, more sophisticated misaligned systems.