North Korea-Linked macOS Backdoor Gaslight Uses Prompt Injection to Evade AI Triage Tools

SentinelLabs, the research arm of SentinelOne, has uncovered a novel macOS backdoor that targets not the sandbox or virtual machine analyzing it, but the AI-assisted triage tools used by malware analysts. Dubbed macOS.Gaslight and attributed with high confidence to North Korean threat actors, the Rust-based implant embeds 38 fabricated system messages designed to derail automated analysis, marking a significant evolution in adversarial manipulation of defensive AI systems.

The technique is a form of prompt injection aimed at the analyst's tools rather than the malware's execution environment. The fabricated messages are formatted as Markdown-fenced blocks that mimic the internal scaffolding of an AI triage tool, warning of token expiry, memory and disk errors, repeated failures, and bogus injection flaws. The goal is to push the AI agent into aborting or refusing its analysis, effectively blinding defenders to the malware's true nature. Earlier versions of this trick used a single injected block, but Gaslight stacks 38 into a cascade, increasing the likelihood of triggering a refusal.

Behind the prompt injection lies a full-featured infostealer and backdoor. The implant provides operators with an interactive shell and is built to harvest browser data from Chrome, Brave, Firefox, and Safari, as well as terminal histories, installed application lists, and a copy of the macOS login keychain. Much of this collection runs through a Python module that the malware can stage on demand, adding flexibility to its data theft capabilities.

To maintain stealth in transit, Gaslight uses Telegram's Bot API for its command-and-control channel, with traffic encrypted and protected by certificate pinning to defeat network inspection. SentinelLabs flagged two additional novel features: the malware can pull a standalone Python interpreter from a public open-source project at runtime, and it is designed to scrub its own Telegram bot token from any logs or crash output, denying defenders a key detection clue.

Attribution was possible partly through Apple's own XProtect, which flagged the file under a signature family that SentinelLabs has tied to North Korean operators. While most of the implant's tradecraft is familiar, the prompt injection stands out as a novel evasion technique. "Anyone building such tooling should treat the contents of the samples they triage as adversarial input, never as instructions, and be prepared to keep hostile content out of the model entirely," SentinelLabs wrote. "As LLM-assisted analysis becomes routine, defenders should expect more samples built to exploit it."

The discovery of Gaslight underscores a growing trend: attackers are increasingly targeting the AI tools that defenders rely on. By manipulating automated analysis systems into misclassifying malware as benign, threat actors can extend the lifespan of their implants and reduce the likelihood of detection. This is particularly concerning for macOS, which has historically seen fewer targeted attacks but is now facing sophisticated nation-state operations that combine advanced evasion techniques with traditional backdoor functionality.

For organizations using AI-powered security tools, the Gaslight campaign serves as a stark reminder that these systems must be hardened against adversarial input. SentinelLabs recommends that defenders treat all sample content as untrusted and ensure that AI models are designed to ignore or sanitize embedded instructions. As the line between human and machine analysis continues to blur, the arms race between attackers and AI-driven defenses is only intensifying.