When AI agents browse the web, read documents, and execute actions on your behalf, they introduce a new attack surface that most security teams are only beginning to understand. A sweeping new disclosure from Anthropic makes the scale of the problem concrete: in red-team testing, its newest browser-capable model was successfully hijacked through prompt injection 31.5 percent of the time — before safeguards engaged.
Prompt injection is deceptively simple: a malicious instruction is hidden inside content that the agent reads — a webpage, a document, an API response. When the agent processes it, the hidden command overrides the original user intent. The result can range from exfiltrating sensitive records to triggering actions that nobody authorized. Unlike traditional software vulnerabilities, there's no patch — the attack exploits the model's core capability: following instructions.
Anthropic's 244-page safety disclosure, released May 28, is unusually detailed. Unlike OpenAI's report (covering only one surface: connectors) or Google and Meta's shorter disclosures, Anthropic broke down prompt injection risk by surface area — browser, tool calls, document ingestion, and API integrations — with the browser being the most vulnerable. The cross-industry comparison matters: there is currently no standard methodology for measuring prompt injection susceptibility, so each lab's numbers are not directly comparable.
Carter Rees, VP of AI at Reputation, framed the issue clearly: "Prompt injection breaks the assumption that every instruction the AI follows came from a trusted source." That assumption has underpinned AI agent deployment strategies, and its failure has direct implications for any organization that has deployed or is planning to deploy autonomous AI workflows.
CrowdStrike's Adam Meyers put it bluntly: as AI is integrated into operations, the attack surface expands, and responsibility for managing that exposure now falls to buyers. Frontier labs can publish disclosures, but they can't control how enterprises deploy agents or what content those agents are allowed to ingest.
The practical guidance emerging from this moment: organizations should treat AI agents with the same rigor they apply to privileged service accounts — least-privilege access, audit logs, sandboxed execution, and a clear incident response protocol for when an agent does something unexpected.
Why It Matters
The 31.5% figure will likely become a reference point in enterprise AI risk discussions for the rest of 2026. As agentic AI moves from proof-of-concept into production, security hygiene around what agents can read, write, and execute is no longer optional. The frontier labs have published their disclosures — now the accountability shifts to the organizations deploying the tools.