DarkMoon: Open-Source AI Platform Automates Penetration Testing

Penetration testing, traditionally a labor-intensive and costly endeavor, is undergoing a transformation with the advent of AI-driven automation. Manual engagements can span weeks, with expert consultants commanding high daily rates, and the quality of results can vary significantly between testers. To address these challenges, a growing number of projects are now entrusting security assessments to AI agents capable of independent planning and execution. DarkMoon, a newly released open-source platform, stands at the forefront of this movement, designed to conduct security assessments from start to finish and deliver comprehensive, evidence-backed reports.

DarkMoon distinguishes itself by separating its reasoning capabilities from its execution tools. At its core, an orchestrator named OpenCode interacts with a large language model (LLM) to formulate assessment plans. These plans are then delegated to a control layer built upon the Model Context Protocol (MCP). This MCP layer maintains an allow-list of approved security utilities, which are executed within isolated Docker containers. The platform comes pre-equipped with over fifty security tools, including popular options like Nuclei, sqlmap, BloodHound, and NetExec. Specialized sub-agents are available for distinct areas such as web applications, Active Directory, Kubernetes, and various network protocols.

The assessment process follows a structured sequence. Initially, DarkMoon discovers open ports and services, fingerprints the technology stack of the target environment, and models the attack surface. Subsequently, it dispatches sub-agents tailored to the identified technologies. A reactive feedback loop ensures that findings dynamically influence subsequent actions; for instance, the detection of a WordPress site early in the scan can trigger a dedicated CMS agent, while the later discovery of a GraphQL endpoint can prompt the engagement of a GraphQL-specific agent. The platform's coverage aligns with established security methodologies, including ISO 27001, NIST SP 800-115, and MITRE ATT&CK frameworks.

Crucially, the LLM within DarkMoon never executes arbitrary commands directly. All actions are routed through the MCP server, which enforces an explicit allow-list of authorized tools and workflows, as emphasized by Mehdi Boutayeb, the lead maintainer of DarkMoon. This design ensures that the AI's actions remain constrained and auditable, preventing unintended or malicious behavior. The scope of each assessment is defined by the user at the outset, specifying targets, domains, IP ranges, or applications. The orchestrator builds its understanding solely from assets within this authorized boundary and adheres to approved methodologies.

Cost is a significant consideration for users of automated security tools. Boutayeb estimates that a typical web application assessment using Anthropic's Claude Opus model incurs approximately ten dollars in API charges. Larger engagements, such as those involving Active Directory or multi-host infrastructures, naturally consume more resources due to the continuous reasoning and planning involved. DarkMoon supports a variety of LLM providers, including OpenAI, Anthropic, and OpenRouter, as well as local models via Ollama or llama.cpp. Boutayeb currently recommends Claude Opus for its balance of reasoning quality, planning stability, and long-context performance.

However, the choice of LLM can introduce complexities related to vendor safety systems. Recent Anthropic models incorporate classifiers that may interrupt or refuse offensive security tasks. DarkMoon's testing revealed that Claude Opus 4.8 encountered these limitations, while Claude Opus 4.6 completed assessments without interruption. The project advises operators to use Opus 4.6 for stability and mentions Anthropic’s Cyber Verification Program as an option for approved organizations. Smaller LLMs with fewer parameters are not yet supported for these autonomous runs.

DarkMoon prioritizes findings that are strongly supported by evidence. Weak signals or ambiguous indicators are downgraded to 'Unconfirmed' status. Confirmed findings are accompanied by detailed evidence, including executed commands, raw outputs, HTTP request/response pairs, and execution traces. Boutayeb stresses that "the LLM is never treated as the source of truth. The evidence collected from the target environment remains the source of truth." This approach aims to maintain analyst validation, reduce manual triage efforts, and ensure that all conclusions are traceable and reproducible.

DarkMoon is freely available on GitHub, offering a powerful, cost-effective, and auditable solution for automating penetration testing. Its architecture emphasizes control, determinism, and evidence-based reporting, positioning it as a significant development in the evolving landscape of AI-powered cybersecurity tools.