Researchers at Carnegie Mellon University built a benchmark measuring AI agents exploiting vulnerabilities in Google's V8 engine. Claude Mythos outperformed GPT-5.5 by a wide margin, despite costing twelve times more. These results prove LLMs can now automate complex security breaches. Practitioners must prioritize robust patching as autonomous exploit generation becomes a reality.