Almost none of 22 multimodal models tested via ProactiveBench asked users for missing visual information. Instead, these systems preferred guessing over requesting clarification. Researchers found that a basic reinforcement learning approach corrects this behavior. This fix reduces hallucinations for developers building vision-integrated applications that require high precision in data interpretation.