AI Skill Scanners Fail to Detect Malicious Agent Skills, Researchers Find

The burgeoning ecosystem of AI agent skills, designed to extend the capabilities of artificial intelligence systems, is facing a significant security challenge. Researchers at Trail of Bits have demonstrated that prominent security scanners, including those from ClawHub, Cisco, and skills.sh, are failing to detect malicious skills intended to compromise user systems. These findings highlight a critical gap in the security of the AI skill supply chain, where vulnerabilities can be exploited through both code and natural language prompts.

Malicious skills pose a substantial threat, capable of stealing credentials, exfiltrating sensitive data, and hijacking AI agents. The ease with which these malicious skills can be distributed through public marketplaces exacerbates the problem. While some distribution channels, like curated marketplaces, may have procedural controls, public platforms often operate on a "ship-first, secure-later" model, leading to a proliferation of fake and harmful skills.

Trail of Bits tested several prominent skill scanners, including ClawHub's integrated solution (VirusTotal and a guard model), Cisco's agent skill scanner, and the scanners integrated into skills.sh. The researchers found that simple evasion techniques were sufficient to bypass these defenses. One particularly effective method involved prepending a large number of newlines to the malicious skill file, which caused the OpenClaw scanner to truncate the file and miss the harmful content. Another bypass confused the VirusTotal scanner, demonstrating the limitations of static analysis against evolving threats.

Further testing on skills.sh and Cisco's scanner revealed that these systems, which often rely on pattern matching and LLM-based analysis, struggled with more complex evasion tactics. By embedding malicious instructions within seemingly benign file types like .docx (which are essentially ZIP archives), the researchers were able to trick the scanners into classifying overtly malicious skills as safe. This highlights the challenge of securing arbitrary file types and the need for more robust dynamic analysis techniques.

The static nature of most current skill scanners is a fundamental weakness, according to the researchers. Adversaries can repeatedly tweak their attacks, making small modifications until they find a way to bypass the detection mechanisms. This "unlimited bites at the apple" scenario means that even if a scanner is updated to fix a specific bypass, new variations can quickly emerge.

The implications of these findings are significant for organizations adopting AI agents and leveraging external skills. The current security measures are insufficient to protect against determined attackers, potentially leading to widespread compromise of sensitive data and systems. The report underscores the urgent need for more sophisticated and dynamic security solutions tailored to the unique challenges of AI skill security.

While the report identifies significant shortcomings, it also notes some positive aspects, such as OpenClaw's strict approach to skill packaging, which limits the types of files that can be included. However, this alone is not enough to ensure security. The research team emphasizes that the security industry must develop more resilient detection methods that can keep pace with the rapid evolution of AI technologies and the threats they enable.

As AI agents become more integrated into critical business processes, securing the associated skill supply chain is paramount. The findings from Trail of Bits serve as a critical warning, urging developers and security professionals to re-evaluate current security practices and invest in more advanced threat detection and prevention strategies for AI-driven systems.

This new report details specific techniques used to bypass AI skill scanners from ClawHub, Cisco, and Vercel, including code obfuscation with excessive newlines, hiding payloads in compiled bytecode or archive files, and employing prompt injection to mislead LLM-based scanners. These methods highlight the limitations of current scanning approaches and underscore the supply chain risks inherent in AI agent ecosystems.