CVE-2020-7212
Description
The _encode_invalid_chars function in util/url.py in the urllib3 library 1.25.2 through 1.25.7 for Python allows a denial of service (CPU consumption) because of an inefficient algorithm. The percent_encodings array contains all matches of percent encodings. It is not deduplicated. For a URL of length N, the size of percent_encodings may be up to O(N). The next step (normalize existing percent-encoded bytes) also takes up to O(N) for each step, so the total time is O(N^2). If percent_encodings were deduplicated, the time to compute _encode_invalid_chars would be O(kN), where k is at most 484 ((10+6*2)^2).
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
The urllib3 library 1.25.2 - 1.25.7 has an O(N^2) CPU consumption denial-of-service vulnerability in URL percent-encoding.
Vulnerability Description
CVE-2020-7212 is a denial-of-service (DoS) vulnerability in the urllib3 library for Python, affecting versions 1.25.2 through 1.25.7. The flaw resides in the _encode_invalid_chars function within util/url.py. The root cause is an inefficient algorithm: the function builds a list of all percent-encoding matches (percent_encodings) without deduplication. For a URL of length N, this list can be of size O(N), and the subsequent normalization step also runs in O(N) per element, resulting in a total time complexity of O(N^2) [1][2].
Exploitation Prerequisites
An attacker can exploit this flaw by providing a specially crafted URL containing a large number of percent-encoded characters. The function's quadratic time complexity means that processing such a URL can consume excessive CPU resources, leading to a denial-of-service condition. No authentication or special privileges are required, and the attack can be carried out remotely via any application that uses urllib3 to process untrusted URLs [2][3].
Impact
Successful exploitation results in high CPU consumption, potentially causing the affected service to become unresponsive. This is a classic algorithmic complexity attack. The impact is limited to CPU exhaustion; data integrity and confidentiality are not directly affected. The CVSS 4.0 base score is 7.5 (High) with an attack vector of network and low complexity [3].
Mitigation Status
The issue was fixed in urllib3 version 1.25.8. The fix introduces a deduplication step for percent-encodings, reducing the time complexity to O(kN), where k is at most 484. Users are strongly advised to upgrade to urllib3 1.25.8 or later. As of March 2020, this vulnerability is not listed in CISA's Known Exploited Vulnerabilities catalog [4].
AI Insight generated on May 21, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
urllib3PyPI | >= 1.25.2, < 1.25.8 | 1.25.8 |
Affected products
2- urllib3/urllib3description
Patches
1a74c9cfbaed9Percent-encode invalid characters with request target (#1586)
3 files changed · +76 −12
CHANGES.rst+3 −0 modified@@ -6,6 +6,9 @@ dev (master) * Change ``is_ipaddress`` to not detect IPvFuture addresses. (Pull #1583) +* Change ``parse_url`` to percent-encode invalid characters within the + path, query, and target components. (Pull #1586) + 1.25.1 (2019-04-24) -------------------
src/urllib3/util/url.py+48 −10 modified@@ -6,6 +6,7 @@ from ..packages import six, rfc3986 from ..packages.rfc3986.exceptions import RFC3986Exception, ValidationError from ..packages.rfc3986.validators import Validator +from ..packages.rfc3986 import abnf_regexp, normalizers, compat, misc url_attrs = ['scheme', 'auth', 'host', 'port', 'path', 'query', 'fragment'] @@ -17,6 +18,9 @@ # Regex for detecting URLs with schemes. RFC 3986 Section 3.1 SCHEME_REGEX = re.compile(r"^(?:[a-zA-Z][a-zA-Z0-9+\-]*:|/)") +PATH_CHARS = abnf_regexp.UNRESERVED_CHARS_SET | abnf_regexp.SUB_DELIMITERS_SET | {':', '@', '/'} +QUERY_CHARS = FRAGMENT_CHARS = PATH_CHARS | {'?'} + class Url(namedtuple('Url', url_attrs)): """ @@ -136,6 +140,37 @@ def split_first(s, delims): return s[:min_idx], s[min_idx + 1:], min_delim +def _encode_invalid_chars(component, allowed_chars, encoding='utf-8'): + """Percent-encodes a URI component without reapplying + onto an already percent-encoded component. Based on + rfc3986.normalizers.encode_component() + """ + if component is None: + return component + + # Try to see if the component we're encoding is already percent-encoded + # so we can skip all '%' characters but still encode all others. + percent_encodings = len(normalizers.PERCENT_MATCHER.findall( + compat.to_str(component, encoding))) + + uri_bytes = component.encode('utf-8', 'surrogatepass') + is_percent_encoded = percent_encodings == uri_bytes.count(b'%') + + encoded_component = bytearray() + + for i in range(0, len(uri_bytes)): + # Will return a single character bytestring on both Python 2 & 3 + byte = uri_bytes[i:i+1] + byte_ord = ord(byte) + if ((is_percent_encoded and byte == b'%') + or (byte_ord < 128 and byte.decode() in allowed_chars)): + encoded_component.extend(byte) + continue + encoded_component.extend('%{0:02x}'.format(byte_ord).encode().upper()) + + return encoded_component.decode(encoding) + + def parse_url(url): """ Given a url, return a parsed :class:`.Url` namedtuple. Best-effort is @@ -160,8 +195,6 @@ def parse_url(url): return Url() is_string = not isinstance(url, six.binary_type) - if not is_string: - url = url.decode("utf-8") # RFC 3986 doesn't like URLs that have a host but don't start # with a scheme and we support URLs like that so we need to @@ -171,11 +204,6 @@ def parse_url(url): if not SCHEME_REGEX.search(url): url = "//" + url - try: - iri_ref = rfc3986.IRIReference.from_string(url, encoding="utf-8") - except (ValueError, RFC3986Exception): - six.raise_from(LocationParseError(url), None) - def idna_encode(name): if name and any([ord(x) > 128 for x in name]): try: @@ -188,8 +216,18 @@ def idna_encode(name): raise LocationParseError(u"Name '%s' is not a valid IDNA label" % name) return name - has_authority = iri_ref.authority is not None - uri_ref = iri_ref.encode(idna_encoder=idna_encode) + try: + split_iri = misc.IRI_MATCHER.match(compat.to_str(url)).groupdict() + iri_ref = rfc3986.IRIReference( + split_iri['scheme'], split_iri['authority'], + _encode_invalid_chars(split_iri['path'], PATH_CHARS), + _encode_invalid_chars(split_iri['query'], QUERY_CHARS), + _encode_invalid_chars(split_iri['fragment'], FRAGMENT_CHARS) + ) + has_authority = iri_ref.authority is not None + uri_ref = iri_ref.encode(idna_encoder=idna_encode) + except (ValueError, RFC3986Exception): + return six.raise_from(LocationParseError(url), None) # rfc3986 strips the authority if it's invalid if has_authority and uri_ref.authority is None: @@ -209,7 +247,7 @@ def idna_encode(name): *validator.COMPONENT_NAMES ).validate(uri_ref) except ValidationError: - six.raise_from(LocationParseError(url), None) + return six.raise_from(LocationParseError(url), None) # For the sake of backwards compatibility we put empty # string values for path if there are any defined values
test/test_util.py+25 −2 modified@@ -135,8 +135,15 @@ def test_invalid_host(self, location): 'http://user\\@google.com', 'http://google\\.com', 'user\\@google.com', - 'http://google.com#fragment#', 'http://user@user@google.com/', + + # Invalid IDNA labels + u'http://\uD7FF.com', + u'http://❤️', + + # Unicode surrogates + u'http://\uD800.com', + u'http://\uDC00.com', ]) def test_invalid_url(self, url): with pytest.raises(LocationParseError): @@ -149,6 +156,15 @@ def test_invalid_url(self, url): ('HTTPS://Example.Com/?Key=Value', 'https://example.com/?Key=Value'), ('Https://Example.Com/#Fragment', 'https://example.com/#Fragment'), ('[::Ff%etH0%Ff]/%ab%Af', '[::ff%25etH0%Ff]/%AB%AF'), + + # Invalid characters for the query/fragment getting encoded + ('http://google.com/p[]?parameter[]=\"hello\"#fragment#', + 'http://google.com/p%5B%5D?parameter%5B%5D=%22hello%22#fragment%23'), + + # Percent encoding isn't applied twice despite '%' being invalid + # but the percent encoding is still normalized. + ('http://google.com/p%5B%5d?parameter%5b%5D=%22hello%22#fragment%23', + 'http://google.com/p%5B%5D?parameter%5B%5D=%22hello%22#fragment%23') ]) def test_parse_url_normalization(self, url, expected_normalized_url): """Assert parse_url normalizes the scheme/host, and only the scheme/host""" @@ -214,7 +230,14 @@ def test_parse_url_normalization(self, url, expected_normalized_url): # Uppercase IRI (u'http://Königsgäßchen.de/straße', - Url('http', host='xn--knigsgchen-b4a3dun.de', path='/stra%C3%9Fe')) + Url('http', host='xn--knigsgchen-b4a3dun.de', path='/stra%C3%9Fe')), + + # Unicode Surrogates + (u'http://google.com/\uD800', Url('http', host='google.com', path='%ED%A0%80')), + (u'http://google.com?q=\uDC00', + Url('http', host='google.com', path='', query='q=%ED%B0%80')), + (u'http://google.com#\uDC00', + Url('http', host='google.com', path='', fragment='%ED%B0%80')), ] @pytest.mark.parametrize(
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-hmv2-79q8-fv6gghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2020-7212ghsaADVISORY
- github.com/pypa/advisory-database/tree/main/vulns/urllib3/PYSEC-2020-149.yamlghsaWEB
- github.com/urllib3/urllib3/blob/master/CHANGES.rstghsax_refsource_MISCWEB
- github.com/urllib3/urllib3/commit/a74c9cfbaed9f811e7563cfc3dce894928e0221aghsax_refsource_MISCWEB
- pypi.org/project/urllib3/1.25.8ghsaWEB
- pypi.org/project/urllib3/1.25.8/mitrex_refsource_MISC
News mentions
0No linked articles in our index yet.