Nine autonomous Claude instances beat human researchers on an open alignment problem during a controlled experiment. Anthropic attempted to transfer the winning method to production models, but the performance gains vanished. This failure highlights the gap between isolated research wins and scalable deployment. Practitioners should expect similar volatility when migrating agentic discoveries to live environments.