VYPR
Moderate severityNVD Advisory· Published Sep 23, 2025· Updated Sep 23, 2025

Regular Expression Denial of Service (ReDoS) in huggingface/transformers

CVE-2025-6921

Description

The huggingface/transformers library, versions prior to 4.53.0, is vulnerable to Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer. The vulnerability arises from the _do_use_weight_decay method, which processes user-controlled regular expressions in the include_in_weight_decay and exclude_from_weight_decay lists. Malicious regular expressions can cause catastrophic backtracking during the re.search call, leading to 100% CPU utilization and a denial of service. This issue can be exploited by attackers who can control the patterns in these lists, potentially causing the machine learning task to hang and rendering services unresponsive.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

HuggingFace Transformers prior to 4.53.0 is vulnerable to ReDoS in the AdamWeightDecay optimizer due to user-controlled regex patterns, leading to denial of service.

Vulnerability

Overview The vulnerability is a Regular Expression Denial of Service (ReDoS) in the AdamWeightDecay optimizer within the huggingface/transformers library, versions prior to 4.53.0. The bug resides in the _do_use_weight_decay method, which processes user-controlled regular expressions from the include_in_weight_decay and exclude_from_weight_decay lists. Malicious regex patterns can cause catastrophic backtracking during the re.search call, leading to 100% CPU utilization and denial of service [2].

Exploitation

An attacker who can influence the patterns in the weight decay lists—such as through configuration or model loading—can trigger the ReDoS. No special network position is required if the attacker can supply these patterns. The vulnerability does not require authentication, as it can be exploited by simply providing a crafted regex string that triggers exponential backtracking [2].

Impact

Successful exploitation results in a denial of service, making the machine learning task hang and rendering the service unresponsive. This can affect any application using the vulnerable version of Transformers with the AdamWeightDecay optimizer [2].

Mitigation

The issue has been fixed in Transformers version 4.53.0. The fix replaces the regex-based search with a simple substring check, eliminating the ReDoS vector [3]. A similar regex issue in the Marian tokenizer was also resolved [4]. Users are advised to upgrade to the latest version.

AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
transformersPyPI
< 4.53.04.53.0

Affected products

2

Patches

2
d37f7517972f

Two ReDOS fixes (#39013)

2 files changed · +7 8
  • src/transformers/models/marian/tokenization_marian.py+5 5 modified
    @@ -13,7 +13,6 @@
     # limitations under the License.
     import json
     import os
    -import re
     import warnings
     from pathlib import Path
     from shutil import copyfile
    @@ -104,7 +103,6 @@ class MarianTokenizer(PreTrainedTokenizer):
     
         vocab_files_names = VOCAB_FILES_NAMES
         model_input_names = ["input_ids", "attention_mask"]
    -    language_code_re = re.compile(">>.+<<")  # type: re.Pattern
     
         def __init__(
             self,
    @@ -186,9 +184,11 @@ def _convert_token_to_id(self, token):
     
         def remove_language_code(self, text: str):
             """Remove language codes like >>fr<< before sentencepiece"""
    -        match = self.language_code_re.match(text)
    -        code: list = [match.group(0)] if match else []
    -        return code, self.language_code_re.sub("", text)
    +        code = []
    +        if text.startswith(">>") and (end_loc := text.find("<<")) != -1:
    +            code.append(text[: end_loc + 2])
    +            text = text[end_loc + 2 :]
    +        return code, text
     
         def _tokenize(self, text: str) -> list[str]:
             code, text = self.remove_language_code(text)
    
  • src/transformers/optimization_tf.py+2 3 modified
    @@ -14,7 +14,6 @@
     # ==============================================================================
     """Functions and classes related to optimization (weight updates)."""
     
    -import re
     from typing import Callable, Optional, Union
     
     import tensorflow as tf
    @@ -296,12 +295,12 @@ def _do_use_weight_decay(self, param_name):
     
             if self._include_in_weight_decay:
                 for r in self._include_in_weight_decay:
    -                if re.search(r, param_name) is not None:
    +                if r in param_name:
                         return True
     
             if self._exclude_from_weight_decay:
                 for r in self._exclude_from_weight_decay:
    -                if re.search(r, param_name) is not None:
    +                if r in param_name:
                         return False
             return True
     
    
47c34fba5c30

Just don't use RE at all

1 file changed · +5 5
  • src/transformers/models/marian/tokenization_marian.py+5 5 modified
    @@ -18,7 +18,6 @@
     from shutil import copyfile
     from typing import Any, Optional, Union
     
    -import regex as re
     import sentencepiece
     
     from ...tokenization_utils import PreTrainedTokenizer
    @@ -104,7 +103,6 @@ class MarianTokenizer(PreTrainedTokenizer):
     
         vocab_files_names = VOCAB_FILES_NAMES
         model_input_names = ["input_ids", "attention_mask"]
    -    language_code_re = re.compile(">>.++<<")  # type: re.Pattern
     
         def __init__(
             self,
    @@ -186,9 +184,11 @@ def _convert_token_to_id(self, token):
     
         def remove_language_code(self, text: str):
             """Remove language codes like >>fr<< before sentencepiece"""
    -        match = self.language_code_re.match(text)
    -        code: list = [match.group(0)] if match else []
    -        return code, self.language_code_re.sub("", text)
    +        code = []
    +        if text.startswith(">>") and (end_loc := text.find("<<")) != -1:
    +            code.append(text[: end_loc + 2])
    +            text = text[end_loc + 2 :]
    +        return code, text
     
         def _tokenize(self, text: str) -> list[str]:
             code, text = self.remove_language_code(text)
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.