Researchers at Peking University developed CiteVQA to track "attribution hallucination," where models provide correct answers but cite irrelevant text. This flaw persists in GPT and Gemini during document analysis. Such errors create critical risks for legal and medical practitioners. The benchmark provides the first systematic method to measure these citation failures.