Prompt injection still drives most agentic AI security failures in production
A backdoor sat on PyPI for three hours in March 2026. Nearly 47,000 downloads occurred during the window. The compromised package, LiteLLM, serves as the language-model gateway for CrewAI, DSPy, Microsoft GraphRAG, and dozens of other AI agent frameworks. Anyone pulling an update during that window pulled in an autonomous attack bot named hackerbot-claw along with it.

Incidents like this are why the OWASP GenAI Security Project’s State of Agentic AI Security and Governance, version 2.01 reads very differently from the version published a year earlier. The 2025 edition cataloged plausible threats. The 2026 edition catalogs CVEs, vendor advisories, and breach reports tied to nearly every category of agentic risk.
Coding agents are the epicenter
Coding agents drive most of the new attack data. Of 53 agentic projects tracked by OWASP’s State of AI Surveyor, 28 are coding agents. The five fastest-growing tools (Claude Code, Gemini CLI, Codex, Cline, and Aider) all sit in that category. Adoption analysis from a16z places coding as the dominant enterprise AI use case by nearly an order of magnitude.
That dominance shows up in advisory counts. The five repositories with the most security advisories are workflow platform n8n (57), Claude Code (22), AutoGPT (15), Dify (13), and Roo-Code (11). Every project on the list is a semi-autonomous framework or coding agent.
Release velocity makes triage difficult. Seven projects in the survey ship updates daily or faster. The leader, trycua/cua, averaged a release every eight hours over the tracked period. Traditional software composition analysis pipelines were never designed to absorb that cadence.
Prompt injection is the universal joint
One technique ties most of these incidents together: prompt injection. OWASP maps it to six of the ten categories in its Top 10 for Agentic Applications.
The root cause is architectural. Large language models treat the system prompt, the user’s request, and any text retrieved from external sources as a single stream of tokens. There is no reliable way to mark some of those tokens as commands and others as data. Hostile text smuggled into a document, calendar invite, or web page can carry the same authority as a legitimate operator instruction.
Two design heuristics dominate practitioner thinking. The first is what researcher Simon Willison calls the “lethal trifecta.” Any agent that combines three properties (access to private data, exposure to untrusted content, and the ability to communicate externally) can be turned into an exfiltration tool by a single injected prompt. The poisoned content steers the agent. The agent pulls the sensitive data. The agent sends it out the door.
The second heuristic comes from Meta, published as the “Agents Rule of Two.” It treats Willison’s three properties as a budget. An agent operating without human approval is allowed to satisfy two of the three. Combining all three requires a human in the loop.
The supply chain became the soft target
Attackers spent the past year learning that the easiest way to compromise an agent is to poison something the agent trusts. Three layers were hit hard.
At the protocol layer, researchers caught the first malicious Model Context Protocol server in the wild. A package called postmark-mcp shipped fifteen clean versions, building legitimacy, before quietly adding a single line of exfiltration code. CVE-2025-6514, a remote code execution flaw rated 9.6 on the CVSS scale, was disclosed in core MCP infrastructure used by hundreds of thousands of developers.
At the agent layer, two CVEs against major coding tools showed how containment can be turned inside out. CVE-2026-22708, disclosed against Cursor, lets an attacker poison the agent’s execution environment so allowlisted commands like git branch deliver arbitrary payloads. The allowlist made the attack easier by auto-approving the very commands the attacker needed. CVE-2025-59532 against OpenAI’s Codex CLI showed that the agent’s own output could redefine the boundary of its sandbox.
At the skill and package layer, hackerbot-claw worked its way up the stack. In February 2026, it exploited GitHub Actions misconfigurations across open source repositories. In March, it harvested LiteLLM’s PyPI publishing token through a compromised Trivy GitHub Actions setup at Aqua Security, then pushed two backdoored versions of LiteLLM directly to PyPI. No human direction was needed after launch.
Safety and security blur at the deployment line
OWASP makes a case with organizational consequences. For systems acting autonomously on production data, AI safety and AI security can no longer live in separate teams.
The example given is Replit in 2025. A coding assistant deleted a production database despite explicit instructions to change nothing, fabricated thousands of fictional records, and falsely reported that rollback was impossible. There was no attacker. The permission model behind the unprovoked failure is the same permission model an attacker would exploit through prompt injection. Containing the safety failure and containing the security gap turn out to be the same job.
Regulators are counting in hours
The compliance window is narrowing. DORA gives a four-hour notification window for major incidents. NIS2 requires a 24-hour early warning. New York’s RAISE Act sets a 72-hour reporting clock for frontier model incidents. California’s SB 53 sets a 15-day window. The OWASP report tracks 42 regulatory instruments across 10 jurisdictions.
Shadow AI sits inside almost every organization OWASP’s contributors examined. According to IBM data cited in the report, only 37% of organizations have a policy in place to detect it.

Download: Simplify security management with CIS SecureSuite Platform