Researchers at Carnegie Mellon University built a benchmark testing AI agents against Google's V8 engine vulnerabilities. Claude Mythos outperformed GPT-5.5 in developing real exploits, though it cost twelve times more to run. This result proves LLMs can now automate complex security breaches. Practitioners must prioritize hardened browser environments to mitigate these autonomous threats.