A Peking University preprint found that LLMs had a 0% end-to-end callback rate when reproducing numerical results from experimental physics papers. While models understood methodology, they consistently failed at data analysis and simulation. This gap reveals a critical reliability deficit. Practitioners cannot trust current models for autonomous scientific reproduction without rigorous human verification.