MCP-for-Stata: Command injection via log_file_name parameter in Stata command wrapper
Description
Summary
The log_file_name parameter in the stata_do API and CLI is directly interpolated into a Stata command string without sanitization. The security guard (GuardValidator) only scans the do-file content but does not validate this parameter. An attacker can inject arbitrary Stata commands (including shell, python, erase, etc.) by crafting a malicious log_file_name containing quotes, newlines, or Stata command separators.
Details
In src/stata_mcp/stata/stata_do/do.py, both _execute_unix_like and _execute_windows construct a Stata command string using Python f-strings:
commands = f"""
capture log close
{self.generate_log_command(log_file, is_replace)}
...
do "{dofile_path}"
...
"""
The generate_log_command method returns:
log_cmd = f'log using "{log_file.as_posix()}", {replace_clause} {log_type} name({log_type}_log)'
Where log_file is constructed from user-supplied log_name:
def generate_log_file(self, log_name: str, extension='log'):
return self.log_file_path / f"{log_name}.{extension}"
The log_name parameter comes directly from user input (via MCP tool stata_do or CLI stata-mcp tool do) without any validation. Since the path is embedded inside double quotes in a Stata command string, an attacker can break out of the string context and inject arbitrary commands.
Additionally, generate_log_file does not prevent path traversal via log_name, allowing arbitrary file write outside the intended log directory.
Proof of
Concept
When calling stata_do via MCP tool with:
{
"dofile_path": "test.do",
"log_file_name": "'; shell echo pwned > /tmp/pwned.txt; '"
}
The generated Stata commands become:
log using "<log_dir>/'; shell echo pwned > /tmp/pwned.txt; '.log", replace text name(text_log)
Stata interprets this as multiple commands, with shell echo pwned > /tmp/pwned.txt; executed as an arbitrary shell command.
Impact
- Remote Code Execution via
shellcommand injection - Arbitrary file write/overwrite via path traversal in
log_name - Complete bypass of the security guard, as the guard only validates do-file content, not wrapper parameters
Remediation / Fix
- Apply strict allowlist validation to
log_name(only alphanumeric, underscore, dot, hyphen; max 128 chars) - Resolve and verify the constructed log path remains within the intended log directory
- Consider generating safe internal filenames (e.g., UUIDs) instead of accepting user-defined log names for command construction
- Apply similar sanitization to
dofile_pathbefore embedding it into Stata command strings
References
- Issue: #74
- Fix commit: https://github.com/SepineTam/stata-mcp/commit/e6f945941ae0c7cf5e74a428e0b3dc82b396382f
Affected products
2Patches
1e6f945941ae0fix: validate log_file_name to prevent Stata command injection
1 file changed · +16 −3
src/stata_mcp/stata/stata_do/do.py+16 −3 modified@@ -9,13 +9,16 @@ import logging import os +import re import subprocess import tempfile from pathlib import Path from typing import Dict, List, Literal, Optional from ...utils import get_nowtime +LOG_FILE_NAME_PATTERN = re.compile(r"^[A-Za-z0-9_.-]{1,128}$") + class StataDo: def __init__(self, @@ -79,7 +82,8 @@ def execute_dofile( """ nowtime = get_nowtime() log_name = log_file_name or nowtime - log_file = self.log_file_path / f"{log_name}.log" + self._validate_log_name(log_name) + log_file = self.generate_log_file(log_name) if self.is_unix: args = dofile_path, log_name, is_replace, enable_smcl @@ -129,6 +133,15 @@ def generate_log_command(log_file: Path, is_replace: bool = True, log_type: Lite log_cmd = f'log using "{log_file.as_posix()}", {replace_clause} {log_type} name({log_type}_log)' return log_cmd + @staticmethod + def _validate_log_name(log_name: str) -> None: + if not LOG_FILE_NAME_PATTERN.fullmatch(log_name): + raise ValueError( + "Invalid log_file_name. Use 1-128 characters from A-Z, a-z, 0-9, underscore, dot, or hyphen." + ) + if any(part in {"", ".", ".."} for part in Path(log_name).parts): + raise ValueError("Invalid log_file_name. Path traversal is not allowed.") + def generate_log_file(self, log_name: str, extension: Literal['smcl', 'log'] = 'log'): return self.log_file_path / f"{log_name}.{extension}" @@ -225,7 +238,7 @@ def _execute_windows(self, dofile_path: Path, log_file: Path, is_replace: bool = cmd = [self.STATA_CLI, "/e", "do", batch_file.as_posix()] result = subprocess.run( cmd, - shell=True, + shell=True, # Windows Stata requires shell execution for proper path resolution and startup capture_output=True, text=True, cwd=self.cwd @@ -360,7 +373,7 @@ def _execute_windows_with_monitors(self, dofile_path: Path, log_file: Path, is_r cmd = [self.STATA_CLI, "/e", "do", str(batch_file)] proc = subprocess.Popen( cmd, - shell=True, + shell=True, # Windows Stata requires shell execution for proper path resolution and startup stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True,
Vulnerability mechanics
Root cause
"The `log_file_name` parameter is directly interpolated into Stata command strings without proper sanitization or validation."
Attack vector
An attacker can exploit this vulnerability by providing a specially crafted `log_file_name` when calling the `stata_do` API or CLI. This malicious input can contain quotes, newlines, or Stata command separators, allowing the attacker to break out of the intended string context. The injected commands, such as `shell`, `python`, or `erase`, can then be executed by Stata, leading to remote code execution or arbitrary file writes. This bypasses the security guard, which only inspects do-file content, not wrapper parameters [ref_id=1].
Affected code
The vulnerability resides in the `stata_do` functionality within `src/stata_mcp/stata/stata_do/do.py`. Specifically, the `_execute_unix_like` and `_execute_windows` methods construct Stata command strings by interpolating the user-supplied `log_file_name` directly. The `generate_log_file` method is also implicated as it constructs the log file path from this unvalidated input [ref_id=1].
What the fix does
The patch introduces strict allowlist validation for the `log_file_name` parameter, permitting only alphanumeric characters, underscores, dots, and hyphens within a specified length limit [patch_id=4835175]. It also adds checks to reject path traversal attempts. This prevents malicious input from being interpolated into Stata commands, thereby closing the command injection and arbitrary file write vulnerabilities.
Preconditions
- inputThe attacker must control the `log_file_name` parameter passed to the `stata_do` API or CLI.
Reproduction
When calling `stata_do` via MCP tool with: ```json { "dofile_path": "test.do", "log_file_name": "'; shell echo pwned > /tmp/pwned.txt; '" } ``` The generated Stata commands become: ```stata log using "<log_dir>/'; shell echo pwned > /tmp/pwned.txt; '.log", replace text name(text_log) ``` Stata interprets this as multiple commands, with `shell echo pwned > /tmp/pwned.txt;` executed as an arbitrary shell command [ref_id=1].
Generated on Jun 4, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.