pypdf: Inefficient decoding of FlateDecode PNG predictor streams
Description
Crafting a PDF with a /FlateDecode stream and PNG predictor causes long runtimes in pypdf; fixed in version 6.12.2.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Crafting a PDF with a /FlateDecode stream and PNG predictor causes long runtimes in pypdf; fixed in version 6.12.2.
Vulnerability
The vulnerability resides in pypdf's handling of PDF streams that use the /FlateDecode filter with a PNG predictor. Specifically, the decoding algorithm for PNG prediction can be extremely inefficient, leading to excessive runtime when processing crafted PDFs. This affects all versions of pypdf prior to 6.12.2 [1].
Exploitation
An attacker can exploit this by crafting a PDF that includes a stream with the /FlateDecode filter and a PNG predictor. The victim must open the malicious PDF using a vulnerable version of pypdf. No authentication or special privileges are required; the attack vector is remote (via PDF delivery) and the complexity is low [1].
Impact
Successful exploitation results in a denial of service (DoS) condition due to abnormally long runtime when processing the malicious PDF. The impact is limited to availability; there is no information disclosure, modification, or execution of arbitrary code [1].
Mitigation
The issue is fixed in pypdf version 6.12.2, released on 2026-05-26 [2]. Users should upgrade to this version immediately. For those who cannot upgrade, a workaround is to apply the changes from pull request #3806, which optimizes _decode_png_prediction for memory and speed [3].
AI Insight generated on Jun 16, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2Patches
16755d925fae2SEC: Optimize _decode_png_prediction regarding memory and speed (#3806)
2 files changed · +20 −5
pypdf/filters.py+5 −5 modified@@ -262,11 +262,11 @@ def _decode_png_prediction(data: bytes, columns: int, row_length: int) -> bytes: logger_warning("Image data is not rectangular. Adding padding.", source=__name__) data += b"\x00" * (row_length - remainder) assert len(data) % row_length == 0 - output = [] - previous_row_data = (0,) * row_length + output = bytearray() + previous_row_data = bytes(row_length) bpp = (row_length - 1) // columns # recomputed locally to not change params for row in range(0, len(data), row_length): - row_data: list[int] = list(data[row : row + row_length]) + row_data = bytearray(data[row : row + row_length]) filter_byte = row_data[0] if filter_byte == 0: @@ -315,8 +315,8 @@ def _decode_png_prediction(data: bytes, columns: int, row_length: int) -> bytes: raise PdfReadError( f"Unsupported PNG filter {filter_byte!r}" ) # pragma: no cover - previous_row_data = tuple(row_data) - output.extend(row_data[1:]) + previous_row_data = bytes(row_data) + output += row_data[1:] return bytes(output) @staticmethod
tests/test_filters.py+15 −0 modified@@ -1130,3 +1130,18 @@ def test_lzwdecode__invalid_first_code(): match=r"^LZW code 258 out of range with empty base at table index 258\.$" ): LZWDecode.decode(data=lzw_data) + + +@pytest.mark.timeout(5) # Has been 20 seconds before. +def test_flatedecode__decode_png_prediction__speed(): + columns = 4096 + rows = 120000 + row_length = columns + 1 # +1 for PNG filter byte + + # Build raw PNG-predicted data: every row starts with filter byte 0 (PNG None) + raw = bytearray(rows * row_length) + for row in range(rows): + raw[row * row_length] = 0 + data = bytes(raw) + + FlateDecode._decode_png_prediction(data=data, columns=columns, row_length=row_length)
Vulnerability mechanics
Root cause
"Inefficient memory management in the PNG predictor decoder (use of Python lists/tuples instead of bytearray/bytes) causes quadratic memory allocation, allowing a crafted PDF to trigger long runtimes."
Attack vector
An attacker crafts a PDF containing a stream that uses `/FlateDecode` with a PNG predictor and sets a large number of rows (e.g., 120000) and columns (e.g., 4096). When pypdf processes this stream, `_decode_png_prediction` builds intermediate Python `list` and `tuple` objects for each row, causing O(n²) memory overhead and long runtimes. This is a resource-exhaustion denial-of-service vector — no authentication or special privileges are required beyond getting the victim to open the crafted PDF.
Affected code
The vulnerability resides in `pypdf/filters.py` in the `_decode_png_prediction` method. The original implementation used Python lists, tuples, and repeated `extend` calls, leading to quadratic memory allocation and extremely slow processing for large data. No other files are implicated.
What the fix does
The patch replaces the `output = []` list with `output = bytearray()` and changes `previous_row_data = (0,) * row_length` to `previous_row_data = bytes(row_length)`. The row data is read as a `bytearray` instead of a `list[int]`, and `output += row_data[1:]` replaces `output.extend(row_data[1:])`. Together these changes eliminate redundant memory allocations and reduce the time from over 20 seconds to under 5 seconds for the test case, closing the denial-of-service vector [patch_id=6167632].
Preconditions
- inputThe attacker must provide a PDF whose stream uses the /FlateDecode filter with a PNG predictor (e.g., /Predictor 15).
- networkThe victim must open the crafted PDF with a version of pypdf prior to 6.12.2.
Generated on Jun 16, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.