AI Agents Show Alarming Ability to Develop Real Browser Exploits Autonomously
A new benchmark from Carnegie Mellon University reveals that AI agents, including Claude Mythos and GPT-5.5, can autonomously develop real browser exploits, raising significant security concerns.

Researchers at Carnegie Mellon University have developed a groundbreaking benchmark that assesses the capabilities of AI agents in exploiting genuine vulnerabilities within Google's V8 engine. This engine, a critical component of the Google Chrome browser, has been the target of numerous cyber attacks in the past. The new benchmark aims to measure how far AI agents can push the boundaries of vulnerability exploitation.
The results show that Claude Mythos and GPT-5.5, two prominent AI models, are capable of autonomously developing real browser exploits. However, there's a notable difference in their performance and cost. Mythos leads GPT-5.5 by a wide margin in terms of exploit development capabilities.
Despite its superior performance, Mythos comes with a hefty price tag, costing twelve times as much as GPT-5.5. The ability of AI agents to autonomously develop browser exploits highlights a concerning trend in cybersecurity. As AI models become increasingly sophisticated, their potential misuse in developing exploits could have significant implications for web security.
The findings underscore the need for heightened awareness and proactive measures to mitigate the risks associated with AI-driven exploit development. The research conducted by Carnegie Mellon University serves as a crucial step towards understanding the dual-use potential of AI in cybersecurity. While AI can be leveraged to enhance security measures, its capabilities also pose new challenges that the cybersecurity community must address.
The development of benchmarks like this one is essential in assessing and preparing for the evolving landscape of AI-driven cybersecurity threats. As the digital landscape continues to evolve, the findings of this study serve as a reminder of the importance of ongoing research into the capabilities and limitations of AI agents in cybersecurity. By shedding light on the potential risks, researchers and developers can work towards creating more secure and resilient digital environments.
Source: The Decoder