VYPR
High severityNVD Advisory· Published Mar 6, 2020· Updated Aug 4, 2024

CVE-2020-7212

CVE-2020-7212

Description

The _encode_invalid_chars function in util/url.py in the urllib3 library 1.25.2 through 1.25.7 for Python allows a denial of service (CPU consumption) because of an inefficient algorithm. The percent_encodings array contains all matches of percent encodings. It is not deduplicated. For a URL of length N, the size of percent_encodings may be up to O(N). The next step (normalize existing percent-encoded bytes) also takes up to O(N) for each step, so the total time is O(N^2). If percent_encodings were deduplicated, the time to compute _encode_invalid_chars would be O(kN), where k is at most 484 ((10+6*2)^2).

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

The urllib3 library 1.25.2 - 1.25.7 has an O(N^2) CPU consumption denial-of-service vulnerability in URL percent-encoding.

Vulnerability Description

CVE-2020-7212 is a denial-of-service (DoS) vulnerability in the urllib3 library for Python, affecting versions 1.25.2 through 1.25.7. The flaw resides in the _encode_invalid_chars function within util/url.py. The root cause is an inefficient algorithm: the function builds a list of all percent-encoding matches (percent_encodings) without deduplication. For a URL of length N, this list can be of size O(N), and the subsequent normalization step also runs in O(N) per element, resulting in a total time complexity of O(N^2) [1][2].

Exploitation Prerequisites

An attacker can exploit this flaw by providing a specially crafted URL containing a large number of percent-encoded characters. The function's quadratic time complexity means that processing such a URL can consume excessive CPU resources, leading to a denial-of-service condition. No authentication or special privileges are required, and the attack can be carried out remotely via any application that uses urllib3 to process untrusted URLs [2][3].

Impact

Successful exploitation results in high CPU consumption, potentially causing the affected service to become unresponsive. This is a classic algorithmic complexity attack. The impact is limited to CPU exhaustion; data integrity and confidentiality are not directly affected. The CVSS 4.0 base score is 7.5 (High) with an attack vector of network and low complexity [3].

Mitigation Status

The issue was fixed in urllib3 version 1.25.8. The fix introduces a deduplication step for percent-encodings, reducing the time complexity to O(kN), where k is at most 484. Users are strongly advised to upgrade to urllib3 1.25.8 or later. As of March 2020, this vulnerability is not listed in CISA's Known Exploited Vulnerabilities catalog [4].

AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
urllib3PyPI
>= 1.25.2, < 1.25.81.25.8

Affected products

2

Patches

1
a74c9cfbaed9

Percent-encode invalid characters with request target (#1586)

https://github.com/urllib3/urllib3Seth Michael LarsonApr 28, 2019via ghsa
3 files changed · +76 12
  • CHANGES.rst+3 0 modified
    @@ -6,6 +6,9 @@ dev (master)
     
     * Change ``is_ipaddress`` to not detect IPvFuture addresses. (Pull #1583)
     
    +* Change ``parse_url`` to percent-encode invalid characters within the
    +  path, query, and target components. (Pull #1586)
    +
     
     1.25.1 (2019-04-24)
     -------------------
    
  • src/urllib3/util/url.py+48 10 modified
    @@ -6,6 +6,7 @@
     from ..packages import six, rfc3986
     from ..packages.rfc3986.exceptions import RFC3986Exception, ValidationError
     from ..packages.rfc3986.validators import Validator
    +from ..packages.rfc3986 import abnf_regexp, normalizers, compat, misc
     
     
     url_attrs = ['scheme', 'auth', 'host', 'port', 'path', 'query', 'fragment']
    @@ -17,6 +18,9 @@
     # Regex for detecting URLs with schemes. RFC 3986 Section 3.1
     SCHEME_REGEX = re.compile(r"^(?:[a-zA-Z][a-zA-Z0-9+\-]*:|/)")
     
    +PATH_CHARS = abnf_regexp.UNRESERVED_CHARS_SET | abnf_regexp.SUB_DELIMITERS_SET | {':', '@', '/'}
    +QUERY_CHARS = FRAGMENT_CHARS = PATH_CHARS | {'?'}
    +
     
     class Url(namedtuple('Url', url_attrs)):
         """
    @@ -136,6 +140,37 @@ def split_first(s, delims):
         return s[:min_idx], s[min_idx + 1:], min_delim
     
     
    +def _encode_invalid_chars(component, allowed_chars, encoding='utf-8'):
    +    """Percent-encodes a URI component without reapplying
    +    onto an already percent-encoded component. Based on
    +    rfc3986.normalizers.encode_component()
    +    """
    +    if component is None:
    +        return component
    +
    +    # Try to see if the component we're encoding is already percent-encoded
    +    # so we can skip all '%' characters but still encode all others.
    +    percent_encodings = len(normalizers.PERCENT_MATCHER.findall(
    +                            compat.to_str(component, encoding)))
    +
    +    uri_bytes = component.encode('utf-8', 'surrogatepass')
    +    is_percent_encoded = percent_encodings == uri_bytes.count(b'%')
    +
    +    encoded_component = bytearray()
    +
    +    for i in range(0, len(uri_bytes)):
    +        # Will return a single character bytestring on both Python 2 & 3
    +        byte = uri_bytes[i:i+1]
    +        byte_ord = ord(byte)
    +        if ((is_percent_encoded and byte == b'%')
    +                or (byte_ord < 128 and byte.decode() in allowed_chars)):
    +            encoded_component.extend(byte)
    +            continue
    +        encoded_component.extend('%{0:02x}'.format(byte_ord).encode().upper())
    +
    +    return encoded_component.decode(encoding)
    +
    +
     def parse_url(url):
         """
         Given a url, return a parsed :class:`.Url` namedtuple. Best-effort is
    @@ -160,8 +195,6 @@ def parse_url(url):
             return Url()
     
         is_string = not isinstance(url, six.binary_type)
    -    if not is_string:
    -        url = url.decode("utf-8")
     
         # RFC 3986 doesn't like URLs that have a host but don't start
         # with a scheme and we support URLs like that so we need to
    @@ -171,11 +204,6 @@ def parse_url(url):
         if not SCHEME_REGEX.search(url):
             url = "//" + url
     
    -    try:
    -        iri_ref = rfc3986.IRIReference.from_string(url, encoding="utf-8")
    -    except (ValueError, RFC3986Exception):
    -        six.raise_from(LocationParseError(url), None)
    -
         def idna_encode(name):
             if name and any([ord(x) > 128 for x in name]):
                 try:
    @@ -188,8 +216,18 @@ def idna_encode(name):
                     raise LocationParseError(u"Name '%s' is not a valid IDNA label" % name)
             return name
     
    -    has_authority = iri_ref.authority is not None
    -    uri_ref = iri_ref.encode(idna_encoder=idna_encode)
    +    try:
    +        split_iri = misc.IRI_MATCHER.match(compat.to_str(url)).groupdict()
    +        iri_ref = rfc3986.IRIReference(
    +            split_iri['scheme'], split_iri['authority'],
    +            _encode_invalid_chars(split_iri['path'], PATH_CHARS),
    +            _encode_invalid_chars(split_iri['query'], QUERY_CHARS),
    +            _encode_invalid_chars(split_iri['fragment'], FRAGMENT_CHARS)
    +        )
    +        has_authority = iri_ref.authority is not None
    +        uri_ref = iri_ref.encode(idna_encoder=idna_encode)
    +    except (ValueError, RFC3986Exception):
    +        return six.raise_from(LocationParseError(url), None)
     
         # rfc3986 strips the authority if it's invalid
         if has_authority and uri_ref.authority is None:
    @@ -209,7 +247,7 @@ def idna_encode(name):
                 *validator.COMPONENT_NAMES
             ).validate(uri_ref)
         except ValidationError:
    -        six.raise_from(LocationParseError(url), None)
    +        return six.raise_from(LocationParseError(url), None)
     
         # For the sake of backwards compatibility we put empty
         # string values for path if there are any defined values
    
  • test/test_util.py+25 2 modified
    @@ -135,8 +135,15 @@ def test_invalid_host(self, location):
             'http://user\\@google.com',
             'http://google\\.com',
             'user\\@google.com',
    -        'http://google.com#fragment#',
             'http://user@user@google.com/',
    +
    +        # Invalid IDNA labels
    +        u'http://\uD7FF.com',
    +        u'http://❤️',
    +
    +        # Unicode surrogates
    +        u'http://\uD800.com',
    +        u'http://\uDC00.com',
         ])
         def test_invalid_url(self, url):
             with pytest.raises(LocationParseError):
    @@ -149,6 +156,15 @@ def test_invalid_url(self, url):
             ('HTTPS://Example.Com/?Key=Value', 'https://example.com/?Key=Value'),
             ('Https://Example.Com/#Fragment', 'https://example.com/#Fragment'),
             ('[::Ff%etH0%Ff]/%ab%Af', '[::ff%25etH0%Ff]/%AB%AF'),
    +
    +        # Invalid characters for the query/fragment getting encoded
    +        ('http://google.com/p[]?parameter[]=\"hello\"#fragment#',
    +         'http://google.com/p%5B%5D?parameter%5B%5D=%22hello%22#fragment%23'),
    +
    +        # Percent encoding isn't applied twice despite '%' being invalid
    +        # but the percent encoding is still normalized.
    +        ('http://google.com/p%5B%5d?parameter%5b%5D=%22hello%22#fragment%23',
    +         'http://google.com/p%5B%5D?parameter%5B%5D=%22hello%22#fragment%23')
         ])
         def test_parse_url_normalization(self, url, expected_normalized_url):
             """Assert parse_url normalizes the scheme/host, and only the scheme/host"""
    @@ -214,7 +230,14 @@ def test_parse_url_normalization(self, url, expected_normalized_url):
     
             # Uppercase IRI
             (u'http://Königsgäßchen.de/straße',
    -         Url('http', host='xn--knigsgchen-b4a3dun.de', path='/stra%C3%9Fe'))
    +         Url('http', host='xn--knigsgchen-b4a3dun.de', path='/stra%C3%9Fe')),
    +
    +        # Unicode Surrogates
    +        (u'http://google.com/\uD800', Url('http', host='google.com', path='%ED%A0%80')),
    +        (u'http://google.com?q=\uDC00',
    +         Url('http', host='google.com', path='', query='q=%ED%B0%80')),
    +        (u'http://google.com#\uDC00',
    +         Url('http', host='google.com', path='', fragment='%ED%B0%80')),
         ]
     
         @pytest.mark.parametrize(
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.