Nine autonomous Claude instances beat human researchers on an open alignment problem during a controlled trial. The victory proved fleeting. Anthropic failed to replicate these gains when transferring the winning method to production models. This gap highlights a persistent struggle to move experimental alignment breakthroughs into stable, real-world deployments for practitioners.