VYPR
Moderate severityNVD Advisory· Published Jan 29, 2024· Updated Nov 3, 2025

aiohttp's HTTP parser (the python one, not llhttp) still overly lenient about separators

CVE-2024-23829

Description

aiohttp is an asynchronous HTTP client/server framework for asyncio and Python. Security-sensitive parts of the Python HTTP parser retained minor differences in allowable character sets, that must trigger error handling to robustly match frame boundaries of proxies in order to protect against injection of additional requests. Additionally, validation could trigger exceptions that were not handled consistently with processing of other malformed input. Being more lenient than internet standards require could, depending on deployment environment, assist in request smuggling. The unhandled exception could cause excessive resource consumption on the application server and/or its logging facilities. This vulnerability exists due to an incomplete fix for CVE-2023-47627. Version 3.9.2 fixes this vulnerability.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
aiohttpPyPI
< 3.9.23.9.2

Affected products

1

Patches

1
33ccdfb0a126

Improve validation in HTTP parser (#8074)

https://github.com/aio-libs/aiohttpSam BullJan 28, 2024via ghsa
4 files changed · +164 17
  • aiohttp/http_parser.py+18 14 modified
    @@ -69,12 +69,11 @@
     #     tchar = "!" / "#" / "$" / "%" / "&" / "'" / "*" / "+" / "-" / "." /
     #             "^" / "_" / "`" / "|" / "~" / DIGIT / ALPHA
     #     token = 1*tchar
    -METHRE: Final[Pattern[str]] = re.compile(r"[!#$%&'*+\-.^_`|~0-9A-Za-z]+")
    -VERSRE: Final[Pattern[str]] = re.compile(r"HTTP/(\d).(\d)")
    -HDRRE: Final[Pattern[bytes]] = re.compile(
    -    rb"[\x00-\x1F\x7F-\xFF()<>@,;:\[\]={} \t\"\\]"
    -)
    -HEXDIGIT = re.compile(rb"[0-9a-fA-F]+")
    +_TCHAR_SPECIALS: Final[str] = re.escape("!#$%&'*+-.^_`|~")
    +TOKENRE: Final[Pattern[str]] = re.compile(f"[0-9A-Za-z{_TCHAR_SPECIALS}]+")
    +VERSRE: Final[Pattern[str]] = re.compile(r"HTTP/(\d)\.(\d)", re.ASCII)
    +DIGITS: Final[Pattern[str]] = re.compile(r"\d+", re.ASCII)
    +HEXDIGITS: Final[Pattern[bytes]] = re.compile(rb"[0-9a-fA-F]+")
     
     
     class RawRequestMessage(NamedTuple):
    @@ -133,6 +132,7 @@ def parse_headers(
             self, lines: List[bytes]
         ) -> Tuple["CIMultiDictProxy[str]", RawHeaders]:
             headers: CIMultiDict[str] = CIMultiDict()
    +        # note: "raw" does not mean inclusion of OWS before/after the field value
             raw_headers = []
     
             lines_idx = 1
    @@ -146,13 +146,14 @@ def parse_headers(
                 except ValueError:
                     raise InvalidHeader(line) from None
     
    +            if len(bname) == 0:
    +                raise InvalidHeader(bname)
    +
                 # https://www.rfc-editor.org/rfc/rfc9112.html#section-5.1-2
                 if {bname[0], bname[-1]} & {32, 9}:  # {" ", "\t"}
                     raise InvalidHeader(line)
     
                 bvalue = bvalue.lstrip(b" \t")
    -            if HDRRE.search(bname):
    -                raise InvalidHeader(bname)
                 if len(bname) > self.max_field_size:
                     raise LineTooLong(
                         "request header name {}".format(
    @@ -161,6 +162,9 @@ def parse_headers(
                         str(self.max_field_size),
                         str(len(bname)),
                     )
    +            name = bname.decode("utf-8", "surrogateescape")
    +            if not TOKENRE.fullmatch(name):
    +                raise InvalidHeader(bname)
     
                 header_length = len(bvalue)
     
    @@ -207,7 +211,6 @@ def parse_headers(
                         )
     
                 bvalue = bvalue.strip(b" \t")
    -            name = bname.decode("utf-8", "surrogateescape")
                 value = bvalue.decode("utf-8", "surrogateescape")
     
                 # https://www.rfc-editor.org/rfc/rfc9110.html#section-5.5-5
    @@ -331,7 +334,8 @@ def get_content_length() -> Optional[int]:
     
                                 # Shouldn't allow +/- or other number formats.
                                 # https://www.rfc-editor.org/rfc/rfc9110#section-8.6-2
    -                            if not length_hdr.strip(" \t").isdecimal():
    +                            # msg.headers is already stripped of leading/trailing wsp
    +                            if not DIGITS.fullmatch(length_hdr):
                                     raise InvalidHeader(CONTENT_LENGTH)
     
                                 return int(length_hdr)
    @@ -559,7 +563,7 @@ def parse_message(self, lines: List[bytes]) -> RawRequestMessage:
                 )
     
             # method
    -        if not METHRE.fullmatch(method):
    +        if not TOKENRE.fullmatch(method):
                 raise BadStatusLine(method)
     
             # version
    @@ -676,8 +680,8 @@ def parse_message(self, lines: List[bytes]) -> RawResponseMessage:
                 raise BadStatusLine(line)
             version_o = HttpVersion(int(match.group(1)), int(match.group(2)))
     
    -        # The status code is a three-digit number
    -        if len(status) != 3 or not status.isdecimal():
    +        # The status code is a three-digit ASCII number, no padding
    +        if len(status) != 3 or not DIGITS.fullmatch(status):
                 raise BadStatusLine(line)
             status_i = int(status)
     
    @@ -818,7 +822,7 @@ def feed_data(
                             if self._lax:  # Allow whitespace in lax mode.
                                 size_b = size_b.strip()
     
    -                        if not re.fullmatch(HEXDIGIT, size_b):
    +                        if not re.fullmatch(HEXDIGITS, size_b):
                                 exc = TransferEncodingError(
                                     chunk[:pos].decode("ascii", "surrogateescape")
                                 )
    
  • CHANGES/8074.bugfix.rst+5 0 added
    @@ -0,0 +1,5 @@
    +Fixed an unhandled exception in the Python HTTP parser on header lines starting with a colon -- by :user:`pajod`.
    +
    +Invalid request lines with anything but a dot between the HTTP major and minor version are now rejected. Invalid header field names containing question mark or slash are now rejected. Such requests are incompatible with :rfc:`9110#section-5.6.2` and are not known to be of any legitimate use.
    +
    +(BACKWARD INCOMPATIBLE)
    
  • CONTRIBUTORS.txt+1 0 modified
    @@ -263,6 +263,7 @@ Pankaj Pandey
     Parag Jain
     Pau Freixes
     Paul Colomiets
    +Paul J. Dorn
     Paulius Šileikis
     Paulus Schoutsen
     Pavel Kamaev
    
  • tests/test_http_parser.py+140 3 modified
    @@ -3,7 +3,8 @@
     
     import asyncio
     import re
    -from typing import Any, List
    +from contextlib import nullcontext
    +from typing import Any, Dict, List
     from unittest import mock
     from urllib.parse import quote
     
    @@ -168,11 +169,27 @@ def test_cve_2023_37276(parser: Any) -> None:
             parser.feed_data(text)
     
     
    +@pytest.mark.parametrize(
    +    "rfc9110_5_6_2_token_delim",
    +    r'"(),/:;<=>?@[\]{}',
    +)
    +def test_bad_header_name(parser: Any, rfc9110_5_6_2_token_delim: str) -> None:
    +    text = f"POST / HTTP/1.1\r\nhead{rfc9110_5_6_2_token_delim}er: val\r\n\r\n".encode()
    +    expectation = pytest.raises(http_exceptions.BadHttpMessage)
    +    if rfc9110_5_6_2_token_delim == ":":
    +        # Inserting colon into header just splits name/value earlier.
    +        expectation = nullcontext()
    +    with expectation:
    +        parser.feed_data(text)
    +
    +
     @pytest.mark.parametrize(
         "hdr",
         (
             "Content-Length: -5",  # https://www.rfc-editor.org/rfc/rfc9110.html#name-content-length
             "Content-Length: +256",
    +        "Content-Length: \N{superscript one}",
    +        "Content-Length: \N{mathematical double-struck digit one}",
             "Foo: abc\rdef",  # https://www.rfc-editor.org/rfc/rfc9110.html#section-5.5-5
             "Bar: abc\ndef",
             "Baz: abc\x00def",
    @@ -265,6 +282,20 @@ def test_parse_headers_longline(parser: Any) -> None:
             parser.feed_data(text)
     
     
    +def test_parse_unusual_request_line(parser: Any) -> None:
    +    if not isinstance(response, HttpResponseParserPy):
    +        pytest.xfail("Regression test for Py parser. May match C behaviour later.")
    +    text = b"#smol //a HTTP/1.3\r\n\r\n"
    +    messages, upgrade, tail = parser.feed_data(text)
    +    assert len(messages) == 1
    +    msg, _ = messages[0]
    +    assert msg.compression is None
    +    assert not msg.upgrade
    +    assert msg.method == "#smol"
    +    assert msg.path == "//a"
    +    assert msg.version == (1, 3)
    +
    +
     def test_parse(parser: Any) -> None:
         text = b"GET /test HTTP/1.1\r\n\r\n"
         messages, upgrade, tail = parser.feed_data(text)
    @@ -567,6 +598,45 @@ def test_headers_content_length_err_2(parser: Any) -> None:
             parser.feed_data(text)
     
     
    +_pad: Dict[bytes, str] = {
    +    b"": "empty",
    +    # not a typo. Python likes triple zero
    +    b"\000": "NUL",
    +    b" ": "SP",
    +    b"  ": "SPSP",
    +    # not a typo: both 0xa0 and 0x0a in case of 8-bit fun
    +    b"\n": "LF",
    +    b"\xa0": "NBSP",
    +    b"\t ": "TABSP",
    +}
    +
    +
    +@pytest.mark.parametrize("hdr", [b"", b"foo"], ids=["name-empty", "with-name"])
    +@pytest.mark.parametrize("pad2", _pad.keys(), ids=["post-" + n for n in _pad.values()])
    +@pytest.mark.parametrize("pad1", _pad.keys(), ids=["pre-" + n for n in _pad.values()])
    +def test_invalid_header_spacing(
    +    parser: Any, pad1: bytes, pad2: bytes, hdr: bytes
    +) -> None:
    +    text = b"GET /test HTTP/1.1\r\n" b"%s%s%s: value\r\n\r\n" % (pad1, hdr, pad2)
    +    expectation = pytest.raises(http_exceptions.BadHttpMessage)
    +    if pad1 == pad2 == b"" and hdr != b"":
    +        # one entry in param matrix is correct: non-empty name, not padded
    +        expectation = nullcontext()
    +    if pad1 == pad2 == hdr == b"":
    +        if not isinstance(response, HttpResponseParserPy):
    +            pytest.xfail("Regression test for Py parser. May match C behaviour later.")
    +    with expectation:
    +        parser.feed_data(text)
    +
    +
    +def test_empty_header_name(parser: Any) -> None:
    +    if not isinstance(response, HttpResponseParserPy):
    +        pytest.xfail("Regression test for Py parser. May match C behaviour later.")
    +    text = b"GET /test HTTP/1.1\r\n" b":test\r\n\r\n"
    +    with pytest.raises(http_exceptions.BadHttpMessage):
    +        parser.feed_data(text)
    +
    +
     def test_invalid_header(parser: Any) -> None:
         text = b"GET /test HTTP/1.1\r\n" b"test line\r\n\r\n"
         with pytest.raises(http_exceptions.BadHttpMessage):
    @@ -689,6 +759,34 @@ def test_http_request_bad_status_line(parser: Any) -> None:
         assert r"\n" not in exc_info.value.message
     
     
    +_num: Dict[bytes, str] = {
    +    # dangerous: accepted by Python int()
    +    # unicodedata.category("\U0001D7D9") == 'Nd'
    +    "\N{mathematical double-struck digit one}".encode(): "utf8digit",
    +    # only added for interop tests, refused by Python int()
    +    # unicodedata.category("\U000000B9") == 'No'
    +    "\N{superscript one}".encode(): "utf8number",
    +    "\N{superscript one}".encode("latin-1"): "latin1number",
    +}
    +
    +
    +@pytest.mark.parametrize("nonascii_digit", _num.keys(), ids=_num.values())
    +def test_http_request_bad_status_line_number(
    +    parser: Any, nonascii_digit: bytes
    +) -> None:
    +    text = b"GET /digit HTTP/1." + nonascii_digit + b"\r\n\r\n"
    +    with pytest.raises(http_exceptions.BadStatusLine):
    +        parser.feed_data(text)
    +
    +
    +def test_http_request_bad_status_line_separator(parser: Any) -> None:
    +    # single code point, old, multibyte NFKC, multibyte NFKD
    +    utf8sep = "\N{arabic ligature sallallahou alayhe wasallam}".encode()
    +    text = b"GET /ligature HTTP/1" + utf8sep + b"1\r\n\r\n"
    +    with pytest.raises(http_exceptions.BadStatusLine):
    +        parser.feed_data(text)
    +
    +
     def test_http_request_bad_status_line_whitespace(parser: Any) -> None:
         text = b"GET\n/path\fHTTP/1.1\r\n\r\n"
         with pytest.raises(http_exceptions.BadStatusLine):
    @@ -710,6 +808,31 @@ def test_http_request_upgrade(parser: Any) -> None:
         assert tail == b"some raw data"
     
     
    +def test_http_request_parser_utf8_request_line(parser: Any) -> None:
    +    if not isinstance(response, HttpResponseParserPy):
    +        pytest.xfail("Regression test for Py parser. May match C behaviour later.")
    +    messages, upgrade, tail = parser.feed_data(
    +        # note the truncated unicode sequence
    +        b"GET /P\xc3\xbcnktchen\xa0\xef\xb7 HTTP/1.1\r\n" +
    +        # for easier grep: ASCII 0xA0 more commonly known as non-breaking space
    +        # note the leading and trailing spaces
    +        "sTeP:  \N{latin small letter sharp s}nek\t\N{no-break space}  "
    +        "\r\n\r\n".encode()
    +    )
    +    msg = messages[0][0]
    +
    +    assert msg.method == "GET"
    +    assert msg.path == "/Pünktchen\udca0\udcef\udcb7"
    +    assert msg.version == (1, 1)
    +    assert msg.headers == CIMultiDict([("STEP", "ßnek\t\xa0")])
    +    assert msg.raw_headers == ((b"sTeP", "ßnek\t\xa0".encode()),)
    +    assert not msg.should_close
    +    assert msg.compression is None
    +    assert not msg.upgrade
    +    assert not msg.chunked
    +    assert msg.url.path == URL("/P%C3%BCnktchen\udca0\udcef\udcb7").path
    +
    +
     def test_http_request_parser_utf8(parser: Any) -> None:
         text = "GET /path HTTP/1.1\r\nx-test:тест\r\n\r\n".encode()
         messages, upgrade, tail = parser.feed_data(text)
    @@ -759,9 +882,15 @@ def test_http_request_parser_two_slashes(parser: Any) -> None:
         assert not msg.chunked
     
     
    -def test_http_request_parser_bad_method(parser: Any) -> None:
    +@pytest.mark.parametrize(
    +    "rfc9110_5_6_2_token_delim",
    +    [bytes([i]) for i in rb'"(),/:;<=>?@[\]{}'],
    +)
    +def test_http_request_parser_bad_method(
    +    parser: Any, rfc9110_5_6_2_token_delim: bytes
    +) -> None:
         with pytest.raises(http_exceptions.BadStatusLine):
    -        parser.feed_data(b'G=":<>(e),[T];?" /get HTTP/1.1\r\n\r\n')
    +        parser.feed_data(rfc9110_5_6_2_token_delim + b'ET" /get HTTP/1.1\r\n\r\n')
     
     
     def test_http_request_parser_bad_version(parser: Any) -> None:
    @@ -979,6 +1108,14 @@ def test_http_response_parser_code_not_int(response: Any) -> None:
             response.feed_data(b"HTTP/1.1 ttt test\r\n\r\n")
     
     
    +@pytest.mark.parametrize("nonascii_digit", _num.keys(), ids=_num.values())
    +def test_http_response_parser_code_not_ascii(
    +    response: Any, nonascii_digit: bytes
    +) -> None:
    +    with pytest.raises(http_exceptions.BadStatusLine):
    +        response.feed_data(b"HTTP/1.1 20" + nonascii_digit + b" test\r\n\r\n")
    +
    +
     def test_http_request_chunked_payload(parser: Any) -> None:
         text = b"GET /test HTTP/1.1\r\n" b"transfer-encoding: chunked\r\n\r\n"
         msg, payload = parser.feed_data(text)[0][0]
    

Vulnerability mechanics

Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

14

News mentions

0

No linked articles in our index yet.