Agent Threat Rules (ATR) Initiative Launched to Standardize AI Agent Security Detection

The rapid proliferation of AI agents within coding assistants, multi-agent frameworks, and other complex systems presents a growing attack surface. These agents, while powerful, are susceptible to a range of threats including prompt injection, tool poisoning, and credential theft due to their extensive access privileges. Recognizing the speed at which these agent-execution flaws are emerging and outpacing existing security tooling, the Agent Threat Rules (ATR) initiative has introduced a standardized, open detection format.

ATR rules are defined using a versioned schema in YAML format. Each rule explicitly declares the attack pattern it aims to detect, specifies the input field it inspects (such as LLM input, tool-call arguments, or SKILL.md content), and includes test cases to validate its efficacy. The project provides a reference engine implemented in TypeScript, alongside a Python wrapper named pyATR, both distributed under the permissive MIT license, facilitating broad adoption and integration.

The ATR project currently boasts over 400 rules categorized into key threat areas including prompt injection, agent manipulation, skill compromise, and context exfiltration. This new format draws inspiration from established security standards like Sigma, a rule standard widely used for SIEM detection, and YARA, a popular pattern language for malware signature analysis, aiming to combine their strengths for the unique challenges of AI agent security.

Performance benchmarks reveal varying levels of recall across different adversarial corpora. Against NVIDIA's garak in-the-wild jailbreak corpus, ATR achieves a recall rate of 98.0%. However, recall drops to 38.5% on the broader garak set and 66.0% on the hackaprompt dataset. More challenging academic adversarial sets, such as PromptBench and PromptInject, show 0.0% recall, while AdvBench, HarmBench, and JailbreakBench register low single-digit recall rates (1.3%, 2.5%, and 5.0% respectively).

Project maintainer Adam Lin highlighted a critical nuance: rules that pass individual tests may still exhibit low aggregate recall. This discrepancy arises because the current regex-based matching layer struggles with semantically rephrased or paraphrased attacks, which fall outside its detection capabilities. The ATR project acknowledges this coverage gap and recommends augmenting ATR with complementary security measures like credential brokering, sandbox execution, and human oversight for high-risk operations.

Despite these challenges, ATR is gaining traction in production environments. Four organizations have already integrated ATR into their security tooling. Microsoft's Agent Governance Toolkit automatically syncs weekly ATR rule packs, Cisco AI Defense runs an ATR rule pack in production, MISP at CIRCL has merged a threat-intelligence cluster, and Gen Digital has incorporated an ATR rule pack. Adopters can self-declare their integration via pull requests, with new entries added without maintainer pre-approval.

The ATR rule set demonstrates broad coverage, mapping to 10 out of 10 OWASP Agentic Top 10 categories and 91.8% (78 out of 85) of SAFE-MCP techniques. Individual rules also include references to specific CVEs affecting popular frameworks like Microsoft Semantic Kernel, Spring AI, LiteLLM, and Claude Code, further enhancing their utility for vulnerability management.

Agent Threat Rules is publicly available on GitHub, offering a crucial step towards a more standardized and effective approach to securing the rapidly evolving landscape of AI agents. Its open nature and growing adoption signal a community-driven effort to address the unique security challenges posed by these powerful new technologies.