Regular Expression Denial of Service (ReDoS) in huggingface/transformers
Description
A Regular Expression Denial of Service (ReDoS) vulnerability was discovered in the Hugging Face Transformers library, specifically within the normalize_numbers() method of the EnglishNormalizer class. This vulnerability affects versions up to 4.52.4 and is fixed in version 4.53.0. The issue arises from the method's handling of numeric strings, which can be exploited using crafted input strings containing long sequences of digits, leading to excessive CPU consumption. This vulnerability impacts text-to-speech and number normalization tasks, potentially causing service disruption, resource exhaustion, and API vulnerabilities.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
ReDoS in Hugging Face Transformers EnglishNormalizer allows crafted digit strings to cause CPU exhaustion, fixed in 4.53.0.
Vulnerability
Overview
CVE-2025-6051 is a Regular Expression Denial of Service (ReDoS) vulnerability discovered in the Hugging Face transformers library, specifically within the normalize_numbers() method of the EnglishNormalizer class [1][2]. The issue affects versions up to and including 4.52.4. The root cause is the use of inefficient regular expressions (e.g., patterns with nested quantifiers like [0-9]++ and [0-9,]*) that can exhibit catastrophic backtracking when processing artificially crafted input strings containing long sequences of digits [3][4]. The vulnerability was introduced during normal development and remained undetected until the fix was applied to import the standard re module with atomic grouping support (available in Python 3.11+) or fall back to the regex library [3][4].
Exploitation
Context
An attacker can trigger the ReDoS by providing a specially crafted numeric string, such as a long sequence of digits with interleaved commas or decimal points, to any application that invokes the normalize_numbers() method. The attack does not require authentication or special privileges beyond the ability to send input to an affected endpoint. The vulnerability is particularly relevant for services performing text-to-speech or number normalization tasks, where user-supplied text is processed [2]. Because the EnglishNormalizer is part of the widely used transformers library, any downstream service or API that uses this normalizer on untrusted input is potentially vulnerable.
Impact
Successful exploitation leads to excessive CPU consumption, which can result in service disruption, resource exhaustion, and potential denial-of-service for other users or processes sharing the same compute resources. The vulnerability is rated with a CVSS v4.0 score pending full assessment, but the potential for remote exploitation with low complexity makes it a significant concern for online services [2].
Mitigation
The Hugging Face team has addressed the vulnerability in transformers version 4.53.0 by replacing the vulnerable regular expression patterns with atomic groups (using the re module) to prevent catastrophic backtracking [3][4]. All users are strongly recommended to upgrade to version 4.53.0 or later. Applications that cannot immediately upgrade should consider sanitizing or limiting the length of numeric input strings passed to normalize_numbers() as a temporary workaround.
- GitHub - huggingface/transformers: 🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.
- NVD - CVE-2025-6051
- Import regex/re correctly · huggingface/transformers@ba8eaba
- Fix ReDOS in tokenizer digit substitution (#38844) · huggingface/transformers@54a0216
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
transformersPyPI | < 4.53.0 | 4.53.0 |
Affected products
2- Range: <=4.52.4
- huggingface/huggingface/transformersv5Range: unspecified
Patches
254a02160eb03Fix ReDOS in tokenizer digit substitution (#38844)
1 file changed · +14 −7
src/transformers/models/clvp/number_normalizer.py+14 −7 modified@@ -15,7 +15,14 @@ """English Normalizer class for CLVP.""" -import re +import sys + + +if sys.version_info >= (3, 11): + # Atomic grouping support was only added to the core RE in Python 3.11 + import re +else: + import regex as re class EnglishNormalizer: @@ -199,12 +206,12 @@ def normalize_numbers(self, text: str) -> str: This method is used to normalize numbers within a text such as converting the numbers to words, removing commas, etc. """ - text = re.sub(re.compile(r"([0-9][0-9\,]+[0-9])"), self._remove_commas, text) - text = re.sub(re.compile(r"£([0-9\,]*[0-9]+)"), r"\1 pounds", text) - text = re.sub(re.compile(r"\$([0-9\.\,]*[0-9]+)"), self._expand_dollars, text) - text = re.sub(re.compile(r"([0-9]+\.[0-9]+)"), self._expand_decimal_point, text) - text = re.sub(re.compile(r"[0-9]+(st|nd|rd|th)"), self._expand_ordinal, text) - text = re.sub(re.compile(r"[0-9]+"), self._expand_number, text) + text = re.sub(r"([0-9][0-9,]+[0-9])", self._remove_commas, text) + text = re.sub(r"£([0-9,]*[0-9])", r"\1 pounds", text) + text = re.sub(r"\$([0-9.,]*[0-9])", self._expand_dollars, text) + text = re.sub(r"([0-9]++\.[0-9]+)", self._expand_decimal_point, text) + text = re.sub(r"[0-9]++(st|nd|rd|th)", self._expand_ordinal, text) + text = re.sub(r"[0-9]+", self._expand_number, text) return text def expand_abbreviations(self, text: str) -> str:
ba8eaba98656Import regex/re correctly
1 file changed · +14 −7
src/transformers/models/clvp/number_normalizer.py+14 −7 modified@@ -15,7 +15,14 @@ """English Normalizer class for CLVP.""" -import regex as re +import sys + + +if sys.version_info >= (3, 11): + # Atomic grouping support was only added to the core RE in Python 3.11 + import re +else: + import regex as re class EnglishNormalizer: @@ -199,12 +206,12 @@ def normalize_numbers(self, text: str) -> str: This method is used to normalize numbers within a text such as converting the numbers to words, removing commas, etc. """ - text = re.sub(re.compile(r"([0-9][0-9\,]+[0-9])"), self._remove_commas, text) - text = re.sub(re.compile(r"£([0-9\,]*[0-9])"), r"\1 pounds", text) - text = re.sub(re.compile(r"\$([0-9\.\,]*[0-9])"), self._expand_dollars, text) - text = re.sub(re.compile(r"([0-9]++\.[0-9]+)"), self._expand_decimal_point, text) - text = re.sub(re.compile(r"[0-9]++(st|nd|rd|th)"), self._expand_ordinal, text) - text = re.sub(re.compile(r"[0-9]+"), self._expand_number, text) + text = re.sub(r"([0-9][0-9,]+[0-9])", self._remove_commas, text) + text = re.sub(r"£([0-9,]*[0-9])", r"\1 pounds", text) + text = re.sub(r"\$([0-9.,]*[0-9])", self._expand_dollars, text) + text = re.sub(r"([0-9]++\.[0-9]+)", self._expand_decimal_point, text) + text = re.sub(r"[0-9]++(st|nd|rd|th)", self._expand_ordinal, text) + text = re.sub(r"[0-9]+", self._expand_number, text) return text def expand_abbreviations(self, text: str) -> str:
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
6- github.com/advisories/GHSA-rcv9-qm8p-9p6jghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-6051ghsaADVISORY
- github.com/huggingface/transformers/commit/54a02160eb030da9be18231c77791f2eb3a52216ghsaWEB
- github.com/huggingface/transformers/commit/ba8eaba9865618253f997784aa565b96206426f0ghsaWEB
- github.com/huggingface/transformers/pull/38844ghsaWEB
- huntr.com/bounties/af929523-7b59-418a-bf55-301830b2ac9dghsaWEB
News mentions
0No linked articles in our index yet.