Quadratic runtime with malformed PDF missing xref marker in pypdf
Description
pypdf is a pure-python PDF library capable of splitting, merging, cropping, and transforming the pages of PDF files. An attacker who uses this vulnerability can craft a PDF which leads to unexpected long runtime. This quadratic runtime blocks the current process and can utilize a single core of the CPU by 100%. It does not affect memory usage. This issue has been addressed in PR 808 and versions from 1.27.9 include this fix. Users are advised to upgrade. There are no known workarounds for this vulnerability.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
PyPDF2PyPI | < 1.27.9 | 1.27.9 |
Affected products
1Patches
1c6c56f550bb3MAINT: Quadratic runtime while parsing reduced to linear (#808)
2 files changed · +26 −4
PyPDF2/pdf.py+4 −4 modified@@ -2082,7 +2082,7 @@ def _pairs(self, array): def readNextEndLine(self, stream, limit_offset=0): debug = False if debug: print(">>readNextEndLine") - line = b_("") + line_parts = [] while True: # Prevent infinite loops in malformed PDFs if stream.tell() == 0 or stream.tell() == limit_offset: @@ -2109,10 +2109,10 @@ def readNextEndLine(self, stream, limit_offset=0): break else: if debug: print(" x is neither") - line = x + line - if debug: print((" RNEL line:", line)) + line_parts.append(x) if debug: print("leaving RNEL") - return line + line_parts.reverse() + return b"".join(line_parts) def decrypt(self, password): """
Tests/test_reader.py+22 −0 modified@@ -1,5 +1,6 @@ import io import os +import time import pytest @@ -9,6 +10,14 @@ from PyPDF2.constants import Ressources as RES from PyPDF2.errors import PdfReadError from PyPDF2.filters import _xobj_to_image +from sys import version_info + +if version_info < ( 3, 0 ): + from cStringIO import StringIO + StreamIO = StringIO +else: + from io import BytesIO + StreamIO = BytesIO TESTS_ROOT = os.path.abspath(os.path.dirname(__file__)) PROJECT_ROOT = os.path.dirname(TESTS_ROOT) @@ -462,3 +471,16 @@ def test_get_destination_age_number(): for outline in outlines: if not isinstance(outline, list): reader.getDestinationPageNumber(outline) + + +def test_do_not_get_stuck_on_large_files_without_start_xref(): + """Tests for the absence of a DoS bug, where a large file without an startxref mark + would cause the library to hang for minutes to hours """ + start_time = time.time() + broken_stream = StreamIO(b"\0" * 5 * 1000 * 1000) + with pytest.raises(PdfReadError): + PdfFileReader(broken_stream) + parse_duration = time.time() - start_time + # parsing is expected take less than a second on a modern cpu, but include a large + # tolerance to account for busy or slow systems + assert parse_duration < 60
Vulnerability mechanics
Generated by null/stub on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-jrm6-h9cq-8gqwghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2023-36810ghsaADVISORY
- github.com/py-pdf/pypdf/commit/c6c56f550bb384e05f0139c796ba1308837d6373ghsaWEB
- github.com/py-pdf/pypdf/issues/582ghsax_refsource_MISCWEB
- github.com/py-pdf/pypdf/pull/808ghsax_refsource_MISCWEB
- github.com/py-pdf/pypdf/security/advisories/GHSA-jrm6-h9cq-8gqwghsax_refsource_CONFIRMWEB
- lists.debian.org/debian-lts-announce/2023/07/msg00019.htmlghsaWEB
News mentions
0No linked articles in our index yet.