VYPR
Moderate severityNVD Advisory· Published Jul 11, 2025· Updated Jul 11, 2025

Regular Expression Denial of Service (ReDoS) in huggingface/transformers

CVE-2025-3933

Description

A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the DonutProcessor class's token2json() method. This vulnerability affects versions 4.50.3 and earlier, and is fixed in version 4.52.1. The issue arises from the regex pattern <s_(.*?)> which can be exploited to cause excessive CPU consumption through crafted input strings due to catastrophic backtracking. This vulnerability can lead to service disruption, resource exhaustion, and potential API service vulnerabilities, impacting document processing tasks using the Donut model.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

A ReDoS vulnerability in Hugging Face Transformers' DonutProcessor.token2json() allows crafted input to cause excessive CPU consumption via catastrophic regex backtracking, fixed in version 4.52.1.

Vulnerability

Overview

A Regular Expression Denial of Service (ReDoS) vulnerability exists in the Hugging Face Transformers library, specifically within the DonutProcessor class's token2json() method. The vulnerability affects versions 4.50.3 and earlier, and is fixed in version 4.52.1 [2]. The root cause is the regex pattern <s_(.*?)> which, when processing crafted input strings, can lead to catastrophic backtracking, causing excessive CPU consumption and potential service disruption [1][2].

Attack

Surface and Exploitation

The vulnerability is triggered by providing specially crafted input strings to the token2json() method during document processing tasks using the Donut model. The attack does not require authentication beyond normal access to the method; an attacker can send malicious input to a service using a vulnerable Transformers version, potentially causing a denial of service through CPU exhaustion [1][2]. The fix, implemented in commit ebbe9b12dd75b69f92100d684c47f923ee262a93, replaces the vulnerable regex with a manual two-part parsing approach to eliminate the backtracking risk [3][4].

Impact

Successful exploitation can lead to severe performance degradation, resource exhaustion, and service disruption for applications using the DonutProcessor for document processing. In server environments, this could translate into API-level vulnerabilities, as excessive CPU consumption may block other requests or crash the service [2].

Mitigation

Users should update the Transformers library to version 4.52.1 or later to address this vulnerability. For those unable to upgrade immediately, avoiding the use of the token2json() method with untrusted input is a temporary workaround. The issue is not known to be listed in CISA's Known Exploited Vulnerabilities catalog as of the publication date.

AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
transformersPyPI
< 4.52.14.52.1

Affected products

2

Patches

1
ebbe9b12dd75

Fix donut backtracking (#37788)

1 file changed · +8 4
  • src/transformers/models/donut/processing_donut.py+8 4 modified
    @@ -156,14 +156,18 @@ def token2json(self, tokens, is_inner_value=False, added_vocab=None):
             output = {}
     
             while tokens:
    -            start_token = re.search(r"<s_(.*?)>", tokens, re.IGNORECASE)
    -            if start_token is None:
    +            # We want r"<s_(.*?)>" but without ReDOS risk, so do it manually in two parts
    +            potential_start = re.search(r"<s_", tokens, re.IGNORECASE)
    +            if potential_start is None:
                     break
    -            key = start_token.group(1)
    +            start_token = tokens[potential_start.start() :]
    +            if ">" not in start_token:
    +                break
    +            start_token = start_token[: start_token.index(">") + 1]
    +            key = start_token[len("<s_") : -len(">")]
                 key_escaped = re.escape(key)
     
                 end_token = re.search(rf"</s_{key_escaped}>", tokens, re.IGNORECASE)
    -            start_token = start_token.group()
                 if end_token is None:
                     tokens = tokens.replace(start_token, "")
                 else:
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.