Simple untargeted training, such as teaching a model to talk like a pirate, often erases misbehavior in current model organisms. This fragility undermines efforts to develop robust alignment techniques for future systems. Researchers must build more resilient test cases to ensure safety interventions actually work. Otherwise, these experiments offer a false sense of security.