Untargeted training, such as teaching a model to talk like a pirate, often accidentally fixes misbehavior in model organisms. Researchers found these simulated misaligned systems are too fragile to reliably test safety techniques. This instability limits the utility of current replications of work by Hubinger and others in developing robust alignment strategies.