Regular Expression Denial of Service (ReDoS) in huggingface/transformers
Description
The huggingface/transformers library, versions prior to 4.53.0, is vulnerable to Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer. The vulnerability arises from the _do_use_weight_decay method, which processes user-controlled regular expressions in the include_in_weight_decay and exclude_from_weight_decay lists. Malicious regular expressions can cause catastrophic backtracking during the re.search call, leading to 100% CPU utilization and a denial of service. This issue can be exploited by attackers who can control the patterns in these lists, potentially causing the machine learning task to hang and rendering services unresponsive.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
HuggingFace Transformers prior to 4.53.0 is vulnerable to ReDoS in the AdamWeightDecay optimizer due to user-controlled regex patterns, leading to denial of service.
Vulnerability
Overview The vulnerability is a Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer within the huggingface/transformers library, versions prior to 4.53.0. The bug resides in the _do_use_weight_decay method, which processes user-controlled regular expressions from the include_in_weight_decay and exclude_from_weight_decay lists. Malicious regex patterns can cause catastrophic backtracking during the re.search call, leading to 100% CPU utilization and denial of service [2].
Exploitation
An attacker who can influence the patterns in the weight decay lists—such as through configuration or model loading—can trigger the ReDoS. No special network position is required if the attacker can supply these patterns. The vulnerability does not require authentication, as it can be exploited by simply providing a crafted regex string that triggers exponential backtracking [2].
Impact
Successful exploitation results in a denial of service, making the machine learning task hang and rendering the service unresponsive. This can affect any application using the vulnerable version of Transformers with the AdamWeightDecay optimizer [2].
Mitigation
The issue has been fixed in Transformers version 4.53.0. The fix replaces the regex-based search with a simple substring check, eliminating the ReDoS vector [3]. A similar regex issue in the Marian tokenizer was also resolved [4]. Users are advised to upgrade to the latest version.
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
transformersPyPI | < 4.53.0 | 4.53.0 |
Affected products
2- Range: <4.53.0
- huggingface/huggingface/transformersv5Range: unspecified
Patches
2d37f7517972fTwo ReDOS fixes (#39013)
2 files changed · +7 −8
src/transformers/models/marian/tokenization_marian.py+5 −5 modified@@ -13,7 +13,6 @@ # limitations under the License. import json import os -import re import warnings from pathlib import Path from shutil import copyfile @@ -104,7 +103,6 @@ class MarianTokenizer(PreTrainedTokenizer): vocab_files_names = VOCAB_FILES_NAMES model_input_names = ["input_ids", "attention_mask"] - language_code_re = re.compile(">>.+<<") # type: re.Pattern def __init__( self, @@ -186,9 +184,11 @@ def _convert_token_to_id(self, token): def remove_language_code(self, text: str): """Remove language codes like >>fr<< before sentencepiece""" - match = self.language_code_re.match(text) - code: list = [match.group(0)] if match else [] - return code, self.language_code_re.sub("", text) + code = [] + if text.startswith(">>") and (end_loc := text.find("<<")) != -1: + code.append(text[: end_loc + 2]) + text = text[end_loc + 2 :] + return code, text def _tokenize(self, text: str) -> list[str]: code, text = self.remove_language_code(text)
src/transformers/optimization_tf.py+2 −3 modified@@ -14,7 +14,6 @@ # ============================================================================== """Functions and classes related to optimization (weight updates).""" -import re from typing import Callable, Optional, Union import tensorflow as tf @@ -296,12 +295,12 @@ def _do_use_weight_decay(self, param_name): if self._include_in_weight_decay: for r in self._include_in_weight_decay: - if re.search(r, param_name) is not None: + if r in param_name: return True if self._exclude_from_weight_decay: for r in self._exclude_from_weight_decay: - if re.search(r, param_name) is not None: + if r in param_name: return False return True
47c34fba5c30Just don't use RE at all
1 file changed · +5 −5
src/transformers/models/marian/tokenization_marian.py+5 −5 modified@@ -18,7 +18,6 @@ from shutil import copyfile from typing import Any, Optional, Union -import regex as re import sentencepiece from ...tokenization_utils import PreTrainedTokenizer @@ -104,7 +103,6 @@ class MarianTokenizer(PreTrainedTokenizer): vocab_files_names = VOCAB_FILES_NAMES model_input_names = ["input_ids", "attention_mask"] - language_code_re = re.compile(">>.++<<") # type: re.Pattern def __init__( self, @@ -186,9 +184,11 @@ def _convert_token_to_id(self, token): def remove_language_code(self, text: str): """Remove language codes like >>fr<< before sentencepiece""" - match = self.language_code_re.match(text) - code: list = [match.group(0)] if match else [] - return code, self.language_code_re.sub("", text) + code = [] + if text.startswith(">>") and (end_loc := text.find("<<")) != -1: + code.append(text[: end_loc + 2]) + text = text[end_loc + 2 :] + return code, text def _tokenize(self, text: str) -> list[str]: code, text = self.remove_language_code(text)
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
5- github.com/advisories/GHSA-4w7r-h757-3r74ghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-6921ghsaADVISORY
- github.com/huggingface/transformers/commit/47c34fba5c303576560cb29767efb452ff12b8beghsaWEB
- github.com/huggingface/transformers/commit/d37f7517972f67e3f2194c000ed0f87f064e5099ghsaWEB
- huntr.com/bounties/287d15a7-6e7c-45d2-8c05-11e305776f1fghsaWEB
News mentions
0No linked articles in our index yet.