Agent Threat Rules: Open detection rule format for AI agent security threats

AI agents run inside coding assistants, MCP servers, and multi-agent frameworks, and the access that makes them useful also opens paths to prompt injection, tool poisoning, and credential theft. Public CVE feeds carry agent-execution flaws that reach production faster than the tooling built to catch them. Agent Threat Rules, or ATR, is an open detection format aimed at this category of attack.

Agent Threat Rules

ATR rules are YAML documents that conform to a versioned schema. Each one declares the attack pattern it matches, the input field it inspects, such as LLM input, tool-call arguments, or SKILL.md content, and the test cases that prove it works. A reference engine written in TypeScript and a Python wrapper called pyATR evaluate the rules, and both ship under the MIT license.

The project carries more than 400 rules across categories that include prompt injection, agent manipulation, skill compromise, and context exfiltration. The format draws on Sigma, the rule standard for SIEM detection, and on YARA, the pattern language for malware signatures.

Benchmark recall across corpora

ATR reports version-pinned benchmark numbers for each test corpus. Against NVIDIA garak’s in-the-wild jailbreak corpus, it records 98.0% recall. Against the broader garak set covering all probe families, recall drops to 38.5%. On hackaprompt it reaches 66.0%.

Several corpora produce low single-digit numbers, and the project records them. AdvBench shows 1.3% recall, HarmBench 2.5%, and JailbreakBench 5.0%. Two academic adversarial sets, PromptBench and PromptInject, register 0.0%.

Adam Lin, the maintainer, addressed how rules that pass their own tests can still miss in aggregate. “PromptBench and PromptInject both register 0.0% recall in our latest version-pinned measurements (data/measurements/ in the repo). AdvBench / HarmBench / JailbreakBench register 1.3%, 2.5%, 5.0% recall respectively. Every rule in those evaluations passed its own true-positive/true-negative tests.”

The split comes from what the regex layer can match. Structured attack patterns sit inside its reach. Paraphrased and semantically rephrased attacks sit outside it. The project documents this as a coverage gap and recommends pairing ATR with credential brokering, sandbox execution, and human review for high-risk actions.

Production use and governance

Four organizations run ATR or have merged it into their tooling. Microsoft’s Agent Governance Toolkit carries a rule pack that auto-syncs weekly from ATR. Cisco AI Defense runs a rule pack in production. MISP at CIRCL merged a threat-intel cluster, and Gen Digital, the parent of Norton, Avast, and AVG, merged a rule pack. Adopters self-declare by pull request, and entries appear without maintainer pre-approval.

The rule set maps to outside frameworks. It covers 10 of 10 OWASP Agentic Top 10 categories and 78 of 85 SAFE-MCP techniques, a rate of 91.8%. Individual rules carry references to specific CVEs, including recent ones affecting Microsoft Semantic Kernel, Spring AI, LiteLLM, and Claude Code.

Agent Threat Rules is available for free on GitHub.

Must read:

Subscribe to the Help Net Security ad-free monthly newsletter to stay informed on the essential open-source cybersecurity tools. Subscribe here!

Don't miss