Replications using Llama-3.3-70B and 3.1-8B show that backdoor robustness depends heavily on the optimizer and distillation methods used. These results often contradict the original Sleeper Agents paper, specifically regarding CoT-distillation. This inconsistency suggests model behaviors are messier than previously assumed. Researchers must now apply more rigorous ablations to verify safety benchmarks.