Thirteen open-weight models were tested across Omni-MATH and Codeforces to isolate the impact of natural-language feedback. Researchers found that multi-turn accuracy gains often stem from resampling or format correction rather than actual learning. This suggests that perceived agent improvement is frequently an illusion of test-time computation. Practitioners should scrutinize feedback loops for genuine cognitive gains.