Prompt Injection Remains Unsolved, Posing Significant Risk to Generative AI

Prompt injection continues to be a persistent and fundamental architectural problem that could significantly impede the development and secure deployment of artificial intelligence systems, according to Ariel Fogel, an AI security researcher and contributor to the Open Worldwide Application Security Project (OWASP). Speaking at Infosecurity Europe 2026, Fogel highlighted that despite widespread awareness of prompt injection among AI and security professionals, a truly effective solution at the core level remains elusive.

The inherent challenge stems from how large language models (LLMs) process information. These models treat all inputs—system instructions, user queries, and data retrieved by agents—as a single, undifferentiated sequence of tokens. This makes it exceptionally difficult to establish and enforce reliable privilege boundaries between these different types of input, leaving them vulnerable to manipulation.

Fogel emphasized that the danger posed by prompt injection has escalated dramatically as AI agents are increasingly equipped with tools and the ability to take autonomous actions. Previously, a successful prompt injection might have resulted in an inaccurate or nonsensical output. However, in the context of agentic AI workflows, a malicious input can now trigger a cascade of real-world actions, transforming a theoretical vulnerability into an active compromise.

"Most organizations are deploying agents faster than they can govern them," Fogel stated, underscoring the rapid pace of AI adoption. This speed and scale, he argued, render traditional security controls, such as sandboxing, allow-lists, and manual review processes, insufficient. These methods, designed for human operators, often fail when the executor is an automated agent.

He further illustrated how some defenses can inadvertently facilitate exploitation. In certain prompt injection scenarios, allow-lists have been observed to streamline attacks because the necessary commands for the agent were already pre-approved. In other instances, an agent's own output has been used to redefine its sandbox boundaries, effectively dismantling the intended containment measures from within.

Fogel acknowledged recent efforts to address the issue, referencing concepts like Simon Willison's 'Lethal Trifecta' (private data access, untrusted content exposure, and external communication) and Meta's 'Rule of Two' as helpful heuristics for reducing the potential impact. However, he cautioned that these frameworks do not provide complete defenses, noting that attacks have already been demonstrated even when only two of the 'trifecta' properties are present.

To effectively manage the heightened risks, Fogel urged a shift from purely preventative strategies to a focus on constraining the actions of an injected agent. This requires implementing controls that operate at machine speed and scale, including continuous behavioral monitoring, real-time containment mechanisms, robust incident response protocols involving both safety and security teams, and enhanced identity management practices like ephemeral credentials and cryptographic attestation for traceability.

"Monitoring infrastructure that operates on the same speed as agents is essential to catch and contain attacks that can unfold in minutes or hours," Fogel concluded. Until models and runtimes can enforce strict privilege separations, defenders must combine rapid detection, automated containment, tighter identity and session design, and cross-disciplinary incident playbooks to navigate the evolving threat landscape of agentic AI.