Red Teamers Turn Claude Desktop into a Double Agent for Code Execution

Pentera Labs' red teamers have successfully demonstrated a sophisticated attack that compromises a developer's machine by weaponizing Anthropic's Claude Desktop application. The exploit chain, detailed in a recent report, leverages a combination of a compromised email inbox and the AI assistant's personalization and synchronization capabilities to achieve remote code execution (RCE).

The attack begins with the prerequisite of a compromised inbox, which can be obtained through various means including phishing, social engineering, or even other AI agents with access to email connectors. Once access to the victim's email is secured, the attackers use it to gain access to their Claude account. This initial compromise is crucial as it allows the attackers to interact with the victim's AI assistant.

With the Claude account compromised, the next critical step involves the Claude Desktop application, which is available for macOS, Windows, and Linux. This application syncs conversations and settings across all devices tied to a user's account. The red teamers focused on Claude Desktop's personalization features, which allow users to set account-wide instructions and preferences for the AI. By injecting a specially crafted, base64-encoded prompt into these personalization settings, the attackers instructed Claude to search for command-capable tools on the victim's machine.

If the victim's machine had tools like Desktop Commander or other MCP connectors installed, the poisoned prompt would instruct Claude to use them, enabling the AI to execute malicious commands stealthily in the background. This allows attackers to establish a reverse shell or deploy other malicious code, leading to a full compromise of the developer's machine. The user remains unaware, believing they are interacting normally with Claude.

In scenarios where no command-capable tools are present, the attack shifts to a 'phishing layer.' The injected prompt causes Claude to present a realistic-looking error message when the user interacts with it. This fake error includes a convincing error code, a link to a purported fix (often using legitimate-looking URLs from the vendor's site), and step-by-step instructions. The goal is to trick the user into clicking the link and executing attacker-controlled commands, effectively turning the AI into a sophisticated phishing tool.

The researchers noted that newer features like Claude's Cowork, which allows the AI to execute tasks on the user's behalf, would further simplify this attack by eliminating the need for tool enumeration or a separate phishing step. However, their research, conducted in late 2025, predated some of these advanced capabilities, necessitating the inclusion of the tool-checking and phishing mechanism.

Once command execution is achieved, whether through direct exploitation or the phishing layer, attackers can perform a wide range of malicious activities. In the demonstrated scenario, Claude was used to continuously fetch and execute bash commands from a remote server controlled by the attacker, effectively acting as a persistent, stealthy command-and-control (C2) agent. This allowed for data exfiltration, credential harvesting, and further lateral movement within the victim's network.

This attack highlights the growing security risks associated with agentic AI tools that possess local execution capabilities. As AI assistants become more integrated into workflows and gain more access to user systems, the potential for them to be manipulated into becoming tools for attackers increases significantly. Organizations deploying such AI agents must be vigilant about securing access, monitoring AI behavior, and understanding the attack vectors that could turn these powerful tools into liabilities.