T3MP3ST Security Framework Turns AI Agents into Autonomous Red-Teamers

A novel open-source security framework named T3MP3ST is transforming general-purpose AI coding agents into autonomous red-teaming operators. Developed by researcher elder-plinius, T3MP3ST acts as an orchestration layer, coordinating multiple AI agent instances through a reconnaissance-to-exploit-to-report kill chain. This framework notably operates without requiring new API keys, cloud infrastructure, or additional billing, a concept the developers refer to as "keyless warfare." It leverages existing AI agent sessions already running on a user's machine, turning them into the operational brain for security missions.

The T3MP3ST framework enforces egress-scope containment, ensuring that networked tools automatically refuse to interact with off-scope public hosts. This design prioritizes security and controlled testing environments. Users can direct the framework towards an authorized target via a web-based "War Room" interface or a command-line interface (CLI), initiating a sophisticated, multi-stage attack simulation.

In performance evaluations, T3MP3ST demonstrated significant capabilities. It achieved a 90.1% pass@1 score on XBOW's 104-challenge XBEN suite, outperforming XBOW's self-reported benchmark of approximately 85%. Each solve was rigorously graded against a committed flag oracle for reproducibility. Furthermore, on Cybench, a 40-task academic benchmark, the framework's single-agent ReAct loop successfully completed 23 out of 40 hint-free solves.

Perhaps most impressively, T3MP3ST was tested against a held-out set of 10 real-world CVEs disclosed in 2026 across seven different programming languages. A single agent successfully identified the exact file, line, and CWE classification for 8 out of 10 vulnerabilities. The broader toolset within T3MP3ST surfaced all 10 results. This is particularly meaningful as these bugs were disclosed after the AI models' training data cutoff, effectively ruling out simple memorization and indicating genuine vulnerability discovery capabilities.

The framework's architecture maps an 8-operator kill chain—Recon, Scanner, Exploiter, Infiltrator, Exfiltrator, Ghost, Coordinator, and Analyst—onto established cybersecurity frameworks like MITRE ATT&CK tactics and the Cyber Kill Chain. While the recon engine and single-agent exploit loop are currently stable and benchmarked, downstream operators such as the Exploiter and Infiltrator are still classified as experimental, with full end-to-end coordinated swarm exploitation yet to be validated at scale.

Security researchers have noted T3MP3ST's release as a significant development in autonomous red-teaming, aligning with a broader industry trend towards AI-driven security tooling. This follows related advancements, such as Anthropic's Mythos model, which has shown substantial improvements in vulnerability-led generation and source-code security analysis. The T3MP3ST developers emphasize that the framework is strictly intended for authorized testing, research, and educational purposes, released under the AGPL-3.0 license without warranty. Unauthorized use against systems without explicit permission remains illegal and carries significant legal responsibility for the operator.