Pentest Agent Suite Open-Sourced: 50 Autonomous Bug-Bounty Agents Across 7 AI Coding Platforms

A fully autonomous bug-bounty framework called Pentest Agent Suite has been open-sourced, delivering 50 specialized security agents, 26 slash commands, 19 CLI tools, and a cross-IDE installer across seven major AI coding platforms — Claude Code, OpenAI Codex, Google Gemini, Cursor, Windsurf, VS Code Copilot, and OpenClaw. The project, published on GitHub by researcher H-mmer, ships as a cohesive security platform with persistent memory, live bug-bounty platform integration, and a FAISS-backed semantic writeup search engine that agents query in real time to surface prior art before testing a vulnerability class.

Pentest Agent Suite is organized around three layers: 50 specialized agents, a dual-server MCP (Model Context Protocol) infrastructure, and a comprehensive rules library. The bounty-platforms MCP server integrates 16 programs — including HackerOne (full API), Bugcrowd, Intigriti, Immunefi, and YesWeHack — exposing seven tools: list_platforms, get_program_scope, sync_program, draft_report, and submit_report. The writeup-search MCP server auto-detects three modes: FAISS semantic search, SQLite keyword search, and a zero-dependency local fallback querying the bundled rules/payloads.md — 2,605 lines spanning XSS, SSRF, SQLi, IDOR, OAuth, SSTI, JWT, LFI, prototype pollution, NoSQLi, and DeFi attack patterns.

The framework's headline feature is the 7-Question Gate, a validation pipeline run by the validator agent on every finding — the first "NO" triggers an automatic KILL, DOWNGRADE, or CHAIN REQUIRED verdict. No finding can reach /submit without a /validate PASS and a /quality score of 7 or higher, enforced by hard gates in the /report and /submit commands. The /autopilot command implements an anti-shallow depth engine that mandates multi-layer stacked-encoding in every payload attempt and refuses to declare an attack surface exhausted until a full exhaustion matrix is complete — configurable via --paranoid, --normal, or --yolo checkpoint modes.

A persistent brain.py tracks every endpoint per target, enforces circuit-breaker logic (5× consecutive 403/429 responses trigger a 60-second auto-backoff), and syncs cross-engagement knowledge via incremental hash-based diffing. The installer (python3 -m tools.installer) generates native configuration formats for each supported tool and writes them to the appropriate IDE directories. IDEs without native subagent support — Cursor, Windsurf, and OpenClaw — receive content translated into skill files and rules, with Claude-specific prose stripped and path variables rewritten to absolute references.

The agent roster spans 19 HackerOne weakness specialists (xss-hunter, sqli-hunter, ssrf-hunter, rce-hunter, oauth-hunter, llm-ai-hunter), an 8-agent SAST pipeline, infrastructure and recon agents (cloud-recon, js-analyzer, graphql-audit, waf-profiler), and a web3-auditor for Solidity and DeFi patterns. Five deep methodology skills accompany the hunters — each distilled from hundreds of real paid reports — including hunt-rce (RSC CVE-2025-55182, runc Leaky Vessels, BentoML pickle), hunt-xss (DOMPurify mXSS, n8n MCP OAuth XSS GHSA-537j-gqpc-p7fq), and hunt-llm-ai (aligned to OWASP LLM Top 10 v2025 and the Agentic AI Top 10).

Cost tracking runs via CC hooks: the SubagentStop event fires cost_hook.py, logging agent name and session cost to cost-tracking.json, with live spend visible in the statusline. A PreToolUse scope hook (scope_hook.py) matches every Bash command against scope.yaml using exact and wildcard patterns, blocking out-of-scope execution before the tool call fires. CVSS scoring is enforced programmatically — cvss_version_guard.py mandates CVSS 3.1 for HackerOne and CVSS 4.0 for all other platforms.

The framework is available at GitHub and is licensed exclusively for authorized security testing under responsible disclosure. A bundled rag-builder/ utility can construct local FAISS writeup indexes from a 146-repository seed list covering CTF archives, bug-bounty reports, and payload collections — all destructive operations gated behind an explicit --execute flag. Requirements include Python 3.10+, uv, and standard recon tooling: nmap, httpx, subfinder, nuclei, ffuf, katana, and sqlmap.