GuardFall Technique Exploits Decades-Old Shell Injection Flaws in Open-Source AI Coding Agents

Researchers have uncovered a critical security vulnerability, dubbed GuardFall, that exploits a decades-old shell injection flaw present in numerous open-source AI coding and computer-use agents. The bypass technique, detailed by Adversa AI, was found to affect ten out of eleven popular agents tested, with only one, "Continue," demonstrating robust defenses against it. This vulnerability allows attackers to bypass the safety checks designed to prevent AI agents from executing dangerous commands, potentially leading to unauthorized code execution and system compromise.

The core of the GuardFall vulnerability lies in how AI agents process commands before execution. Most agents employ a blocklist to filter out malicious commands based on plain text patterns. However, they fail to account for how the underlying shell, such as bash, interprets and rewrites these commands before running them. This discrepancy means that a command flagged as safe by the filter might be transformed into a destructive one by the shell. For instance, a filter might not recognize r''m as rm because of the inserted empty quotes, yet bash will strip these quotes and execute the rm command, leading to unintended file deletion.

This bypass is not a new bug but rather a dangerous convention and a class of problems stemming from the fundamental difference between how text is filtered and how the shell ultimately interprets it. Consequently, there is no single CVE to track or patch, making a universal fix challenging. The exploit requires two conditions to align: first, the AI agent must generate a malicious command, often disguised within seemingly routine tasks like build files or documentation responses, rather than an explicit destructive command. Second, the agent must be running with auto-execute flags enabled or its container sandbox disabled, configurations common in automated development pipelines.

Among the affected agents are opencode, Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands, SWE-agent, and the Hermes project, which first documented the issue. Collectively, these tools boast hundreds of thousands of GitHub stars, indicating widespread adoption. Adversa AI successfully demonstrated a full end-to-end attack against the production Plandex binary, with similar exploitability observed in eight other agents. While no public exploitation has been reported, the research highlights a significant latent risk.

The "Continue" agent stands out for its defense mechanism, which involves parsing commands the way bash would before execution. It breaks down commands into shell-interpretable pieces and maintains a hard blocklist of destructive commands. While this approach proved effective against most payloads, even "Continue" showed some weakness in its command-line auto-run mode, though the most severe threats were still blocked. Adversa AI suggests that implementing similar protective measures is a feasible task for experienced engineers, estimated to take around two days.

To mitigate the risks associated with GuardFall, users are advised to implement several immediate measures. Running AI agents with the $HOME directory pointed to a disposable folder can prevent access to sensitive secrets like ~/.ssh and ~/.aws. Disabling auto-execute flags such as --auto-exec or --auto-run unless absolutely necessary is also crucial. Additionally, avoiding agent execution on pull requests from forks and treating configuration files within repositories as untrusted code can further reduce exposure.

GuardFall follows a series of recent findings that expose AI coding agents to security risks, including Adversa's own TrustFall research and other bypass techniques targeting agents like Claude Code and Gemini CLI. Attacks such as AutoJack and Agentjacking have also demonstrated how malicious content can be used to trick agents into executing commands with elevated privileges. The recurring theme is the persistent challenge of ensuring that untrusted text processed by AI agents is accurately understood and safely executed by the underlying shell environments.

The widespread use of open-source AI coding agents in software development pipelines necessitates a robust approach to security. The GuardFall vulnerability underscores the need for AI tool developers to move beyond simple blocklists and implement more sophisticated command validation mechanisms that mirror the actual execution environment. As AI agents become more integrated into critical development workflows, addressing these fundamental security flaws is paramount to preventing widespread compromise.