AI Research

Anthropic's browser agent got hijacked 31.5% of the time before safeguards engaged

AI News Desk

VentureBeat

Jun 01, 2026

2 min read

Anthropic's AI model was hijacked 31.5% of the time by attackers in a browser environment before safeguards were engaged, a vulnerability rate significantly higher than competitors' models.

Anthropic's browser agent got hijacked 31.5% of the time before safeguards engaged

Across the frontier labs, Anthropic's latest AI model has demonstrated the highest vulnerability to prompt injection attacks, with a striking 31.5% success rate of hijacking attempts in a browser environment before safeguards were activated. In a comparison with other leading AI labs, including OpenAI, Google, and Meta, Anthropic's model showed a significantly higher vulnerability rate. While OpenAI reported a robustness score of 0.963 against known attacks on connectors, and Google and Meta did not provide a comparable number, Anthropic's figure stands out as a liability.

However, it also provides a clear and detailed breakdown of the vulnerabilities. The issue of prompt injection, where a malicious instruction is hidden in a seemingly innocuous input, poses a significant threat to AI models. As Carter Rees, VP of AI at Reputation, noted, 'A phrase as innocuous as, 'ignore previous instructions' can carry a payload as devastating as a buffer overflow, yet it shares no commonality with known malware signatures.' This highlights the need for more robust security measures.

Anthropic's model was tested across four surfaces, including tool use, coding, computer use, and browser, with varying degrees of vulnerability. The Opus 4.8 card, which provides a detailed breakdown of the vulnerabilities, shows that the model is most vulnerable in a browser environment. When tested with an adaptive attacker that rewrites its approach based on the model's responses, the model's vulnerability rate drops significantly with safeguards in place.

The Cross-Vendor Prompt Injection Disclosure Grid, which maps the different testing methods and results from each lab, reveals a lack of standardization in measuring prompt injection vulnerabilities. This makes it difficult for buyers to compare the security of different models. As Adam Meyers, Senior Vice President of Counter Adversary Operations at CrowdStrike, noted, 'As you implement AI, it increases your attack surface, so now you have to be able to protect those AI models against adversary misuse or data poisoning or prompt injection.' To address this issue, experts recommend that buyers take a more proactive approach to evaluating AI model security.

This includes pulling every agent deployed or scoped and tagging each by the surface it touches, demanding a per-surface attack success rate, and running their own injection tests before deploying any agent.

Source: VentureBeat