Researchers at Carnegie Mellon University built a benchmark testing AI agents against Google's V8 engine. Claude Mythos outperformed GPT-5.5 in creating real exploits, though it cost twelve times more to run. This proves LLMs can autonomously weaponize software vulnerabilities. Security teams must now accelerate automated patching to counter agentic threats.