Tens of thousands of lines of code were generated on MirrorCode by models leveraging rapid test-based feedback. Anthropic similarly demonstrated models hill-climbing alignment tasks through autoresearch loops. This suggests AI excels at long-horizon tasks only when progress is immediately verifiable. Practitioners should prioritize tight feedback loops to scale alignment automation.