Anthropic Silently Patches Claude Code Sandbox Bypass Vulnerability

Anthropic has silently patched a sandbox bypass vulnerability in its AI coding assistant, Claude Code, that could have allowed attackers to escape the tool's security sandbox and exfiltrate sensitive data. The flaw, discovered by a security researcher, could be chained with a prompt injection attack to bypass the intended isolation of the AI agent. Anthropic applied the fix without issuing a public advisory or assigning a CVE ID, a move that has drawn criticism from the security community.

The vulnerability resided in Claude Code's sandboxing mechanism, which is designed to prevent the AI from executing arbitrary commands or accessing the host system beyond its permitted scope. By exploiting the sandbox bypass, an attacker who first achieved prompt injection — tricking the AI into following malicious instructions embedded in user-supplied content — could then escape the sandbox and run arbitrary code on the underlying system. This chain of exploits could lead to data exfiltration, credential theft, or further compromise of the development environment.

Prompt injection attacks against large language models have become a growing concern as AI coding assistants gain widespread adoption. These attacks work by embedding hidden instructions in input data that the model processes, causing it to override its safety guidelines. In the case of Claude Code, a successful prompt injection could have directed the AI to execute commands that the sandbox was supposed to block, but the sandbox bypass removed that last line of defense.

The researcher who reported the vulnerability noted that the issue was particularly dangerous because Claude Code is often used in development environments with access to source code repositories, API keys, and other sensitive assets. An attacker who could chain prompt injection with the sandbox bypass could potentially steal proprietary code, inject backdoors, or pivot to other internal systems.

Anthropic's decision to patch the vulnerability silently — without a public disclosure or CVE — has sparked debate about responsible disclosure practices for AI products. Critics argue that users of Claude Code were left unaware of the risk and unable to assess their exposure, while defenders note that silent patches can prevent attackers from reverse-engineering the fix before users update. The company has not commented on the matter beyond applying the patch.

This incident highlights the unique security challenges posed by AI coding assistants, which combine powerful automation with broad system access. As these tools become integral to software development workflows, vendors face increasing pressure to adopt transparent vulnerability management practices. The Claude Code sandbox bypass serves as a reminder that AI agents, like any software, require rigorous security testing and clear disclosure policies to maintain user trust.

For now, users of Claude Code are advised to ensure they are running the latest version, as the patch has been silently rolled out. No evidence of in-the-wild exploitation has been reported, but the potential for abuse was significant. The broader industry will be watching how Anthropic and other AI vendors handle future vulnerabilities in their rapidly evolving products.

Researcher Aonan Guan has now publicly detailed the second sandbox bypass — a SOCKS5 hostname null-byte injection patched in Claude Code v2.1.88 on March 31 — which, when chained with prompt injection, could exfiltrate cloud tokens, GitHub credentials, and internal API data. Guan criticized Anthropic for not issuing a CVE or advisory specific to Claude Code, noting that the earlier bug (CVE-2025-66479) was assigned only to the upstream sandbox-runtime library, leaving users unaware of the risk. Anthropic stated the fix was a public commit in the sandbox-runtime repository and that Guan's report was closed as a duplicate of an internal finding.

This new article details a more complex attack chain involving prompt injection and a flawed permission model in Claude Code's GitHub Actions. It reveals how unauthenticated attackers could exfiltrate secrets, steal OIDC tokens, and push malicious code by exploiting a vulnerability in the checkWritePermissions function and chaining it with other workflow misconfigurations. The previous report focused on a sandbox bypass, while this one elaborates on the full exploitability of the CI/CD workflow itself, including a direct attack on Anthropic's own repository.