Microsoft Updates AI Agent Failure Taxonomy Amidst Rapid Open-Source Adoption

Microsoft's AI Red Team has released version 2.0 of its Taxonomy of Failure Modes in Agentic AI Systems, a significant update reflecting the rapidly evolving threat landscape surrounding artificial intelligence agents. The initial taxonomy, published in April 2025, aimed to provide a common vocabulary for novel security challenges posed by AI systems. However, the past twelve months have seen an explosion in the adoption of open-source agentic frameworks and the deployment of AI agents in production environments, necessitating a revision to capture emerging attack vectors and vulnerabilities.

The driving forces behind this update include the mainstream adoption of open-source agentic frameworks, the maturation and increased vulnerability surface of the Model Context Protocol (MCP) ecosystem, the transition of computer-use agents from research to production, and extensive empirical data gathered from twelve months of red team engagements. The rapid proliferation of frameworks like OpenClaw, which garnered hundreds of thousands of stars on GitHub shortly after its release, highlighted the urgent need for robust security practices. An audit of OpenClaw revealed numerous vulnerabilities, including a critical remote code execution flaw (CVE-2026-25253) and the widespread exposure of API keys and credentials.

Version 2.0 introduces seven new failure mode categories to address these evolving threats. These include 'Agentic Supply Chain Compromise,' where malicious natural-language instructions injected through third-party integrations alter agent behavior without touching code; 'Goal Hijacking,' where adversarial instructions subtly redirect an agent's objective without full compromise; and 'Inter-Agent Trust Escalation,' a form of privilege escalation in multi-agent systems mirroring traditional confused deputy problems but mediated by natural language.

Further new categories address specific operational realities. 'Computer Use Agent (CUA) Visual Attack' details how agents interacting with graphical interfaces can be manipulated through visually innocuous content containing adversarial instructions. 'Session Context Contamination' describes how adversaries can bias an agent's reasoning over extended interactions by injecting malicious data early in a session. 'MCP / Plugin Abuse' specifically targets vulnerabilities within the standardized protocols for connecting AI models to external tools, such as tool description poisoning and server-side instruction injection.

Finally, 'Capability / Architecture Disclosure' addresses the risk of agents revealing sensitive internal details like tool schemas or system prompt structures, which can transform black-box probing into white-box exploitation. These new categories underscore the shift from theoretical risks to observed patterns in real-world deployments.

The updated taxonomy also expands on mitigation strategies and operational findings. Red team engagements confirmed some initial predictions while falsifying others, and surfaced entirely unanticipated failure modes. The report emphasizes that the security community must adapt to these new attack surfaces, which often involve manipulating natural language instructions and exploiting trust relationships within complex AI ecosystems, rather than traditional binary vulnerabilities.

Microsoft's AI Red Team stresses the importance of a shared vocabulary to effectively defend against these novel threats. The updated taxonomy aims to equip security professionals, developers, and researchers with the necessary framework to identify, understand, and mitigate the risks associated with increasingly sophisticated agentic AI systems. The ongoing evolution of AI capabilities necessitates continuous reassessment and adaptation of security paradigms.