Praxen: Open-Source Tool Verifies AI Agent Behavior Against Declared Policies

As organizations increasingly rely on AI agents to automate critical workflows, a new open-source tool called Praxen aims to solve a fundamental security problem: proving that software agents do exactly what they are told. Released by Exabeam, Praxen is the reference implementation of Agent Behavior Verification (ABV), a control model that assigns each agent an authorized role and then continuously checks that the agent’s actual operations stay within that role.

The core mechanism is a document called the Worker Remit — a markdown policy that declares an agent’s mission, authorized tools, approved channels, counterparties, and explicitly forbidden actions. Praxen reads evidence from source code, deployment state, behavioral logs, and governance documents, then reports every divergence between declared intent and observed behavior. Findings are delivered as a self-contained HTML report, a machine-readable JSON file, and a plain-text summary, with all data processed locally to avoid exposure.

Praxen runs a suite of named checks that go beyond simple policy enforcement. These cover policy-implementation divergence, credential exposure, configuration gaps, capability drift, supply-chain risk, half-wired controls, empty stub files in security-relevant paths, secondary prompt discovery, and compound signal reasoning that chains individual findings into a higher-severity attack path. Every finding is tagged against multiple frameworks: the OWASP Top 10 for LLM Applications 2025, the OWASP Top 10 for Agentic AI Applications 2026, the OWASP Secure MCP Server Development Guide 2026, and the RAISE Framework, which assigns a maturity score across six categories.

A key architectural decision is the separation of pre-deployment verification from runtime monitoring, which Exabeam calls Agent Behavior Analytics (ABA). Steve Wilson, Chief AI Officer at Exabeam, told Help Net Security that the company intends the Worker Remit to serve both stages. “Our aim is a single policy,” he said. “The remit provides a structured, human-readable definition of an agent’s intended role, permissions, responsibilities, constraints, and approval requirements.” Wilson added that the remit “provides a natural foundation for that analysis because it captures the organization’s explicit expectations for the agent.” Verification answers whether a team built the agent it intended, while analytics answers whether the agent behaves as intended in production. The two capabilities are separate for now, but “over time, we expect them to become increasingly connected as part of a broader Behavior Intelligence strategy for AI agents.”

Because Praxen relies on a coding agent — tested against Claude Code — two runs against the same evidence can produce different findings. Wilson said the major results are “highly stable,” with smaller movements in severity counts or maturity scoring at the margins. Every finding cites the underlying files and artifacts, allowing independent review. Exabeam maintains a frozen regression suite to measure consistency, and recommends that teams “run the analysis multiple times, report the median result and range, and union the material findings across runs.”

Another practical challenge is handling evidence sets that exceed a model’s context window. Praxen begins with a discovery pass across source code, configuration files, dependency manifests, tool and MCP definitions, memory artifacts, and logs, prioritizing the most behaviorally relevant material. Large logs are sampled. Findings are written incrementally, and the analysis state is checkpointed into a structured manifest so that if the AI session exceeds its context window, the report can be reconstructed. Coverage is recorded directly, so sampled evidence carries a marker and missing evidence registers as its own signal. “Context-window limits are a real constraint for every AI-powered analysis platform,” Wilson said. “The goal is to make them visible, measurable, and recoverable so users can trust the results they receive.”

Praxen is available for free on GitHub and runs before deployment and on each release. It requires Python 3.9 or later and a coding agent. The tool represents a maturing approach to AI agent security, moving from ad hoc scanning toward systematic, policy-driven verification that mirrors how enterprises manage human access and permissions. As AI agents proliferate across industries, tools like Praxen provide a much-needed layer of accountability, helping organizations answer the basic question: is my agent doing what I told it to do?