Thirteen open-weight models tested across Omni-MATH and Codeforces reveal that multi-turn improvements often result from resampling or format correction rather than actual feedback. This student-teacher protocol separates external guidance from unguided self-refinement. The findings suggest that natural-language feedback provides fewer gains than previously assumed. Practitioners should prioritize test-time computation over complex feedback loops.