Almost none of 22 multimodal models tested via ProactiveBench asked for missing visual information before answering. Instead, these systems preferred guessing over seeking clarification. Researchers found a simple reinforcement learning approach fixes this behavior. This shift reduces hallucinations and improves reliability for developers building vision-integrated applications.