Simple untargeted training, such as teaching a model to talk like a pirate, often erases misbehavior in model organisms. This fragility undermines efforts to develop robust alignment techniques for future systems. Researchers including Hubinger and Ryd found that these fragile models fail to simulate persistent misalignment. Practitioners must build more resilient organisms to test safety interventions.