Regular Expression Denial of Service (ReDoS) in huggingface/transformers
Description
A vulnerability in the preprocess_string() function of the transformers.testing_utils module in huggingface/transformers version v4.48.3 allows for a Regular Expression Denial of Service (ReDoS) attack. The regular expression used to process code blocks in docstrings contains nested quantifiers, leading to exponential backtracking when processing input with a large number of newline characters. An attacker can exploit this by providing a specially crafted payload, causing high CPU usage and potential application downtime, effectively resulting in a Denial of Service (DoS) scenario.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
CVE-2025-2099 describes a ReDoS vulnerability in Hugging Face Transformers v4.48.3's preprocess_string() function, exploitable via crafted input causing high CPU and potential DoS.
Vulnerability
Analysis
The vulnerability is a Regular Expression Denial of Service (ReDoS) in the preprocess_string() function of the transformers.testing_utils module in Hugging Face Transformers version v4.48.3 [1][2]. The root cause is a complex regular expression with nested quantifiers used to process code blocks within docstrings. This pattern causes exponential backtracking when processing input containing a large number of newline characters [2][4]. The advisory database entry (PYSEC-2025-40) confirms the issue was present in this specific version [3].
Exploitation
An attacker can exploit this by providing a specially crafted payload to any component that uses preprocess_string(). This function is part of the testing utilities, so exploitation likely requires the application to process docstrings or test inputs containing a malicious string with numerous newlines. No authentication is mentioned as required, suggesting the attack surface could be broad if untrusted input reaches the function [2][4]. The exploitation does not require network access beyond delivering the payload.
Impact
Successful exploitation causes high CPU usage and potential application downtime, resulting in a Denial of Service (DoS) scenario [2]. No data confidentiality or integrity impact is described; the attack is purely on availability. The CVSS score is not provided in the references, but the severity is considered due to the potential for resource exhaustion [2].
Mitigation
A patch has been proposed and merged via Pull Request #36648, which cleans up the regex by removing overly complex non-capturing groups, removing unnecessary flags, and simplifying the pattern to avoid exponential runtime [4]. Users are advised to update to a fixed version (likely v4.49.0 or later) or apply the patch manually. There is no mention of the vulnerability being exploited in the wild or being added to CISA's KEV list.
- GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- NVD - CVE-2025-2099
- advisory-database/vulns/transformers/PYSEC-2025-40.yaml at main · pypa/advisory-database
- Cleanup the regex used for doc preprocessing by Rocketknight1 · Pull Request #36648 · huggingface/transformers
AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
transformersPyPI | < 4.50.0 | 4.50.0 |
Affected products
3- Range: =4.48.3
- huggingface/huggingface/transformersv5Range: unspecified
Patches
18cb522b4190bCleanup the regex used for doc preprocessing (#36648)
1 file changed · +2 −2
src/transformers/testing_utils.py+2 −2 modified@@ -2732,8 +2732,8 @@ def preprocess_string(string, skip_cuda_tests): cuda stuff is detective (with a heuristic), this method will return an empty string so no doctest will be run for `string`. """ - codeblock_pattern = r"(```(?:python|py)\s*\n\s*>>> )((?:.*?\n)*?.*?```)" - codeblocks = re.split(re.compile(codeblock_pattern, flags=re.MULTILINE | re.DOTALL), string) + codeblock_pattern = r"(```(?:python|py)\s*\n\s*>>> )(.*?```)" + codeblocks = re.split(codeblock_pattern, string, flags=re.DOTALL) is_cuda_found = False for i, codeblock in enumerate(codeblocks): if "load_dataset(" in codeblock and "# doctest: +IGNORE_RESULT" not in codeblock:
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
6- github.com/advisories/GHSA-qq3j-4f4f-9583ghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-2099ghsaADVISORY
- github.com/huggingface/transformers/commit/8cb522b4190bd556ce51be04942720650b1a3e57ghsaWEB
- github.com/huggingface/transformers/pull/36648ghsaWEB
- github.com/pypa/advisory-database/tree/main/vulns/transformers/PYSEC-2025-40.yamlghsaWEB
- huntr.com/bounties/97b780f3-ffca-424f-ad5d-0e1c57a5bde4ghsaWEB
News mentions
0No linked articles in our index yet.