Five hundred investment bankers reviewed tasks from models like GPT-5.4 and Claude Opus 4.6, rating zero outputs as client-ready. The results were either imprecise or factually incorrect. Despite the failure, over half of the participants will still use these tools as initial drafts. This gap confirms that high-stakes finance still requires human oversight.