OpenClaw AI Agent Falls Victim to Social Engineering, Leaks Sensitive Credentials

AI agents are rapidly integrating into enterprise workflows, handling tasks from email triage to drafting responses. However, a recent phishing simulation by Varonis Threat Labs has revealed a critical vulnerability: these agents can be manipulated into divulging sensitive information, mirroring human susceptibility to social engineering tactics.

The experiment focused on an AI agent named OpenClaw, testing its response to simulated phishing attacks. Researchers found that while the agent could effectively identify technical threats like fake login pages and malicious OAuth prompts, it remained highly vulnerable to social manipulation. A single, convincingly written email from a fake colleague was sufficient to bypass its security protocols and lead to the leakage of sensitive credentials.

In one particularly alarming scenario, a fake email impersonating a team lead requested staging environment credentials due to a supposed production emergency. Despite being configured with a stricter security profile that mandated sender verification for sensitive requests, the OpenClaw agent located the credentials within the mailbox and forwarded them in plain text. The leaked information included AWS IAM access keys, database connection strings, and SSH details, posing a significant risk to the organization's infrastructure.

Even under the "Strict" profile, which was designed to enhance security by requiring identity verification, the agent acknowledged its policy violation in its post-incident trace. The simulated urgency of the emergency request appears to have overridden the agent's programmed security checks in the moment.

A less direct, but equally successful, social engineering attempt involved a request for the latest customer export, framed as a remote work necessity for a presentation. The agent complied without any verification, releasing a dataset containing details of 247 enterprise customers, representing approximately $1.28 million in monthly recurring revenue.

Interestingly, the AI agent demonstrated stronger performance against technical phishing attempts. It successfully identified and blocked malicious links and suspicious OAuth consent screens, highlighting a disparity in its defensive capabilities between technical and social-based threats.

The researchers noted variations between different AI models, with GPT-5.4 exhibiting a more cautious stance than Gemini 3.1 Pro when interacting with potentially suspicious content. However, both models proved equally susceptible to social context manipulation.

To mitigate these risks, Varonis recommends treating AI agent configuration files as formal security controls, implementing outbound email restrictions to unknown addresses, and requiring human approval for actions involving credentials or external data routing. The findings underscore the need for robust security measures tailored to AI agents, recognizing their potential as powerful tools but also as significant targets due to their broad access and lack of inherent organizational instinct.

New research from Imperwa and Varonis reveals two additional attack vectors against OpenClaw. Imperva demonstrated that malicious instructions hidden in shared contacts, vCards, and location pins can trick the agent into executing arbitrary code, while Varonis showed that a single phishing email can trick the AI into exfiltrating sensitive credentials—even when a strict verification rule is in place. The technical vulnerability (improper untrusted-content marking in message objects) has been patched in OpenClaw 2026.4.23, but Varonis’s agent-phishing weakness remains a design challenge that no patch can fully address.