Regular Expression Denial of Service (ReDoS) in huggingface/transformers
Description
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the DonutProcessor class's token2json() method. This vulnerability affects versions 4.50.3 and earlier, and is fixed in version 4.52.1. The issue arises from the regex pattern <s_(.*?)> which can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. This vulnerability can lead to service disruption, resource exhaustion, and potential API service vulnerabilities, impacting document processing tasks using the Donut model.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
A ReDoS vulnerability in Hugging Face Transformers' DonutProcessor.token2json() allows crafted input to cause excessive CPU consumption via catastrophic regex backtracking, fixed in version 4.52.1.
Vulnerability
Overview
A Regular Expression Denial of Service (ReDoS) vulnerability exists in the Hugging Face Transformers library, specifically within the DonutProcessor class's token2json() method. The vulnerability affects versions 4.50.3 and earlier, and is fixed in version 4.52.1 [2]. The root cause is the regex pattern <s_(.*?)> which, when processing crafted input strings, can lead to catastrophic backtracking, causing excessive CPU consumption and potential service disruption [1][2].
Attack
Surface and Exploitation
The vulnerability is triggered by providing specially crafted input strings to the token2json() method during document processing tasks using the Donut model. The attack does not require authentication beyond normal access to the method; an attacker can send malicious input to a service using a vulnerable Transformers version, potentially causing a denial of service through CPU exhaustion [1][2]. The fix, implemented in commit ebbe9b12dd75b69f92100d684c47f923ee262a93, replaces the vulnerable regex with a manual two-part parsing approach to eliminate the backtracking risk [3][4].
Impact
Successful exploitation can lead to severe performance degradation, resource exhaustion, and service disruption for applications using the DonutProcessor for document processing. In server environments, this could translate into API-level vulnerabilities, as excessive CPU consumption may block other requests or crash the service [2].
Mitigation
Users should update the Transformers library to version 4.52.1 or later to address this vulnerability. For those unable to upgrade immediately, avoiding the use of the token2json() method with untrusted input is a temporary workaround. The issue is not known to be listed in CISA's Known Exploited Vulnerabilities catalog as of the publication date.
- GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- NVD - CVE-2025-3933
- Fix donut backtracking (#37788) · huggingface/transformers@ebbe9b1
- Fix donut backtracking by Rocketknight1 · Pull Request #37788 · huggingface/transformers
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
transformersPyPI | < 4.52.1 | 4.52.1 |
Affected products
2- Range: <=4.50.3
- huggingface/huggingface/transformersv5Range: unspecified
Patches
1ebbe9b12dd75Fix donut backtracking (#37788)
1 file changed · +8 −4
src/transformers/models/donut/processing_donut.py+8 −4 modified@@ -156,14 +156,18 @@ def token2json(self, tokens, is_inner_value=False, added_vocab=None): output = {} while tokens: - start_token = re.search(r"<s_(.*?)>", tokens, re.IGNORECASE) - if start_token is None: + # We want r"<s_(.*?)>" but without ReDOS risk, so do it manually in two parts + potential_start = re.search(r"<s_", tokens, re.IGNORECASE) + if potential_start is None: break - key = start_token.group(1) + start_token = tokens[potential_start.start() :] + if ">" not in start_token: + break + start_token = start_token[: start_token.index(">") + 1] + key = start_token[len("<s_") : -len(">")] key_escaped = re.escape(key) end_token = re.search(rf"</s_{key_escaped}>", tokens, re.IGNORECASE) - start_token = start_token.group() if end_token is None: tokens = tokens.replace(start_token, "") else:
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
5- github.com/advisories/GHSA-37mw-44qp-f5jmghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-3933ghsaADVISORY
- github.com/huggingface/transformers/commit/ebbe9b12dd75b69f92100d684c47f923ee262a93ghsaWEB
- github.com/huggingface/transformers/pull/37788ghsaWEB
- huntr.com/bounties/25282953-5827-4384-bb6f-5790d275721bghsaWEB
News mentions
0No linked articles in our index yet.