DarkMoon: Open-source AI pentesting platform

Penetration testing has long run on expert time, with specialists spending days probing a network or web application by hand. Manual engagements stretch across weeks, expert consultants run into thousands of dollars a day, and results vary with the tester. Automation promises to narrow those gaps. A growing set of projects now hands the work to AI agents that plan and execute on their own. DarkMoon, an open-source platform, sits in that group. It runs a security assessment end to end and delivers an evidence-backed report at the finish.

DarkMoon AI pentesting

A reasoning layer kept apart from execution

DarkMoon separates the model that thinks from the tools that act. An orchestrator called OpenCode talks to a large language model, plans each move, and delegates any real action to a control layer built on the Model Context Protocol. That MCP layer exposes an allow-list of approved tools and runs them inside an isolated Docker container holding more than fifty security utilities, among them Nuclei, sqlmap, BloodHound, and NetExec. Specialized sub-agents cover web applications, Active Directory, Kubernetes, and network protocols.

An assessment follows a set sequence. The platform discovers open ports and services, fingerprints the technology stack, models the attack surface, then dispatches the sub-agents that match what it found. A reactive loop feeds each result back in, so a WordPress site detected early can trigger a CMS agent, and a GraphQL endpoint surfaced later can pull in a GraphQL agent. Coverage aligns with established methodologies, among them ISO 27001, NIST SP 800-115, and MITRE ATT&CK modeling.

“The LLM never executes arbitrary commands directly.” Every action passes through the MCP server, which he said “exposes only an explicit allow-list of authorized tools and workflows,” Mehdi Boutayeb, the lead maintainer of DarkMoon, told Help Net Security.

Keeping the agent inside its scope

Scope comes from the user at the start of each run, given as targets, domains, IP ranges, or applications. The orchestrator builds its picture only from assets found inside that authorized boundary and hands execution to approved methodologies. New tools stay unavailable until someone installs them, registers them in the MCP server, and exposes them to the orchestration layer. The design goal, in Boutayeb’s words: “The objective is to make execution deterministic, auditable and constrained rather than allowing unrestricted autonomous behaviour.”

What a scan costs

Cost is among the first questions a buyer asks. A typical web application assessment using Claude Opus runs about ten dollars in API charges, according to Boutayeb. He noted that larger engagements such as Active Directory or multi-host infrastructures consume more, since the model keeps reasoning over fresh evidence, planning steps, and attack paths. DarkMoon supports OpenAI, Anthropic, OpenRouter, and local models through Ollama or llama.cpp. In Boutayeb’s assessment, “Claude is currently the model we recommend,” for its balance of reasoning quality, planning stability, and long-context performance.

The model choice carries a wrinkle tied to vendor safety systems. Recent Anthropic models ship classifiers able to interrupt, refuse, or quietly downgrade offensive-security tasks, even on authorized engagements. In DarkMoon’s own testing, Claude Opus 4.8 hit those limits partway through an assessment, and Claude Opus 4.6 ran the assessment end to end with no interruptions. The project points operators toward Opus 4.6 as the steadiest choice and mentions Anthropic’s Cyber Verification Program as an optional route for approved organizations. Small models in the lower parameter ranges remain unsupported for these autonomous runs.

“Basically, it can be completely free to run if you stay local, or a few dollars per assessment if you want the extra reasoning quality of a frontier model. Each user picks their own balance between cost and capability,” said Boutayeb.

Evidence before a finding counts

DarkMoon promotes a finding only when evidence supports it. Weak signals such as generic HTTP 200 responses, reflected payloads, and ambiguous indicators get downgraded and labeled Unconfirmed. Confirmed findings carry the executed commands, raw outputs, HTTP request and response pairs, and execution traces that back them.

On accuracy, Boutayeb said: “The LLM is never treated as the source of truth. The evidence collected from the target environment remains the source of truth.” He added that the goal keeps analyst validation in place, reduces manual triage, and leaves every conclusion traceable and reproducible.

DarkMoon is available for free on GitHub.

Must read:

Subscribe to the Help Net Security ad-free monthly newsletter to stay informed on the essential open-source cybersecurity tools. Subscribe here!

Don't miss