A Peking University preprint reveals that LLMs achieved a 0% end-to-end callback rate when reproducing numerical results from experimental physics papers. While models understood the methodology, they failed at data analysis and simulation. This failure exposes a critical gap in research reliability. Practitioners cannot trust LLMs for autonomous scientific verification.