pypdf: Possible infinite loop when processing outlines/bookmarks in writer
Description
CVE-2026-54531 is a denial-of-service vulnerability in pypdf that allows an attacker to cause an infinite loop by merging a crafted PDF with outlines into a writer.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
CVE-2026-54531 is a denial-of-service vulnerability in pypdf that allows an attacker to cause an infinite loop by merging a crafted PDF with outlines into a writer.
Vulnerability
The vulnerability exists in pypdf's handling of PDF outlines (bookmarks) when merging a file into a writer. A specially crafted PDF with malicious outline entries can trigger an infinite loop during the merge process. Affected versions are prior to pypdf 6.13.0 [1][4].
Exploitation
An attacker needs to provide a crafted PDF file that contains outlines. The victim must merge this file into a pypdf writer object (e.g., using PdfWriter.append() or similar). No authentication or special privileges are required; the attack can be triggered by processing a malicious PDF [1][2].
Impact
Successful exploitation results in a denial-of-service condition: the application enters an infinite loop, consuming CPU resources and potentially causing a hang or crash. No data confidentiality or integrity is compromised; the impact is limited to availability [1][4].
Mitigation
The vulnerability is fixed in pypdf version 6.13.0, released on 2026-06-05 [3]. Users should upgrade to this version or later. As a workaround, users who cannot upgrade can apply the changes from pull request #3830 [2]. No other workarounds are documented.
AI Insight generated on Jun 16, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2Patches
168822ded066fSEC: Avoid infinite loops for outlines and text extraction (#3830)
4 files changed · +158 −33
pypdf/_page.py+14 −10 modified@@ -1864,20 +1864,24 @@ def _layout_mode_fonts(self) -> dict[str, Font]: """ # Font retrieval logic adapted from pypdf.PageObject._extract_text() - objr: Any = self + obj: Any = self fonts: dict[str, Font] = {} - while objr is not None: - try: - resources_dict: Any = objr[PG.RESOURCES] - except KeyError: - resources_dict = {} + visited: set[int] = set() + while True: + obj_id = id(obj) + if obj_id in visited: + logger_warning("Detected cycle in /Parent hierarchy when retrieving fonts.", source=__name__) + break + visited.add(obj_id) + + resources_dict: Any = obj.get(PG.RESOURCES, {}) if "/Font" in resources_dict and self.pdf is not None: for font_name in resources_dict["/Font"]: fonts[font_name] = Font.from_font_resource(resources_dict["/Font"][font_name]) - try: - objr = objr["/Parent"].get_object() - except KeyError: - objr = None + + if "/Parent" not in obj: + break + obj = obj["/Parent"].get_object() return fonts
pypdf/_writer.py+38 −23 modified@@ -2788,7 +2788,7 @@ def merge( _ro = reader.root_object if import_outline and CO.OUTLINES in _ro: outline = self._get_filtered_outline( - _ro.get(CO.OUTLINES, None), srcpages, reader + node=_ro.get(CO.OUTLINES, None), pages=srcpages, reader=reader ) self._insert_filtered_outline( outline, outline_item_typ, None @@ -3053,54 +3053,69 @@ def _insert_filtered_annotations( def _get_filtered_outline( self, + *, node: Any, pages: dict[int, PageObject], reader: PdfReader, + visited: Optional[set[int]] = None, ) -> list[Destination]: """ Extract outline item entries that are part of the specified page set. - Args: - node: - pages: - reader: - Returns: A list of destination objects. """ - new_outline = [] + if visited is None: + visited = set() + new_outline: list[Destination] = [] if node is None: - node = NullObject() + return new_outline node = node.get_object() if is_null_or_none(node): node = DictionaryObject() + if node.get("/Type", "") == "/Outlines" or "/Title" not in node: + node_id = id(node) + if node_id in visited: + logger_warning("Detected cycle in outlines.", source=__name__) + return [] + visited.add(node_id) + node = node.get("/First", None) if node is not None: node = node.get_object() - new_outline += self._get_filtered_outline(node, pages, reader) + new_outline += self._get_filtered_outline(node=node, pages=pages, reader=reader, visited=visited) else: - v: Union[None, IndirectObject, NullObject] - while node is not None: + cloned_page: Union[None, IndirectObject, NullObject] + while True: node = node.get_object() - o = cast("Destination", reader._build_outline_item(node)) - v = self._get_cloned_page(cast("PageObject", o["/Page"]), pages, reader) - if v is None: - v = NullObject() - o[NameObject("/Page")] = v + node_id = id(node) + if node_id in visited: + logger_warning("Detected cycle in outlines.", source=__name__) + break + visited.add(node_id) + + destination = cast("Destination", reader._build_outline_item(node)) + cloned_page = self._get_cloned_page(cast("PageObject", destination["/Page"]), pages, reader) + if cloned_page is None: + cloned_page = NullObject() + destination[NameObject("/Page")] = cloned_page if "/First" in node: - o._filtered_children = self._get_filtered_outline( - node["/First"], pages, reader + destination._filtered_children = self._get_filtered_outline( + node=node["/First"], pages=pages, reader=reader, visited=visited ) else: - o._filtered_children = [] + destination._filtered_children = [] if ( - not isinstance(o["/Page"], NullObject) - or len(o._filtered_children) > 0 + not isinstance(cloned_page, NullObject) + or len(destination._filtered_children) > 0 ): - new_outline.append(o) - node = node.get("/Next", None) + new_outline.append(destination) + + if "/Next" not in node: + break + node = node["/Next"] return new_outline def _clone_outline(self, dest: Destination) -> TreeObject:
tests/test_text_extraction.py+29 −0 modified@@ -649,3 +649,32 @@ def test_text_state_params__unicode_decode_error(encoding): # Assertions: 'replace' mode changes invalid UTF-8 bytes to '\xfffd'. assert parameters.text == "\ufffd" assert parameters._decoded_value == "\ufffd" + + +@pytest.mark.timeout(5) +def test_page_object__layout_mode_fonts__cyclic(caplog) -> None: + writer = PdfWriter() + + font = DictionaryObject({ + NameObject("/Type"): NameObject("/Font"), + NameObject("/Subtype"): NameObject("/Type1"), + NameObject("/BaseFont"): NameObject("/Helvetica"), + }) + fonts = {"/F1": Font.from_font_resource(font)} + page = writer.add_blank_page(width=10, height=10) + dictionary2 = DictionaryObject(DictionaryObject({ + NameObject("/Resources"): DictionaryObject({ + NameObject("/Font"): DictionaryObject({ + NameObject("/F1"): font + }) + }) + })) + reference2 = writer._add_object(dictionary2) + dictionary3 = DictionaryObject({NameObject("/Parent"): reference2}) + reference3 = writer._add_object(dictionary3) + page[NameObject("/Parent")] = reference3 + dictionary2[NameObject("/Parent")] = page.indirect_reference + page.pdf = writer + + assert page._layout_mode_fonts() == fonts + assert caplog.messages == ["Detected cycle in /Parent hierarchy when retrieving fonts."]
tests/test_writer.py+77 −0 modified@@ -3218,3 +3218,80 @@ def test_encrypt__incremental(): with pytest.raises(NotImplementedError): writer.encrypt(user_password="dummy") + + +@pytest.mark.timeout(5) +def test_get_filtered_outline__first__cyclic(caplog) -> None: + writer = PdfWriter() + reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf") + + dictionary1 = DictionaryObject({ + NameObject("/Type"): NameObject("/Outlines") + }) + reference1 = writer._add_object(dictionary1) + dictionary2 = DictionaryObject({ + NameObject("/Type"): NameObject("/Outlines") + }) + reference2 = writer._add_object(dictionary2) + dictionary3 = DictionaryObject({ + NameObject("/First"): reference2, + NameObject("/Type"): NameObject("/Outlines") + }) + reference3 = writer._add_object(dictionary3) + dictionary1[NameObject("/First")] = reference3 + dictionary2[NameObject("/First")] = reference1 + + assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == [] + assert caplog.messages == ["Detected cycle in outlines."] + + +@pytest.mark.timeout(5) +def test_get_filtered_outline__next_first__cyclic(caplog) -> None: + writer = PdfWriter() + reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf") + + dictionary1 = DictionaryObject({ + NameObject("/Title"): TextStringObject("test") + }) + _reference1 = writer._add_object(dictionary1) + dictionary2 = DictionaryObject({ + NameObject("/Type"): NameObject("/Outlines") + }) + reference2 = writer._add_object(dictionary2) + dictionary1[NameObject("/Next")] = reference2 + dictionary2[NameObject("/First")] = reference2 + + assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == [] + assert caplog.messages == ["Detected cycle in outlines."] + + +@pytest.mark.timeout(5) +def test_get_filtered_outline__next_next__cyclic(caplog) -> None: + writer = PdfWriter() + reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf") + + dictionary1 = DictionaryObject({ + NameObject("/Title"): TextStringObject("test") + }) + reference1 = writer._add_object(dictionary1) + dictionary2 = DictionaryObject({ + NameObject("/Title"): TextStringObject("test") + }) + reference2 = writer._add_object(dictionary2) + dictionary3 = DictionaryObject({ + NameObject("/Next"): reference2, + NameObject("/Title"): TextStringObject("test") + }) + reference3 = writer._add_object(dictionary3) + dictionary1[NameObject("/Next")] = reference3 + dictionary2[NameObject("/Next")] = reference1 + + assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == [] + assert caplog.messages == ["Detected cycle in outlines."] + + +def test_get_filtered_outline__node_is_none() -> None: + writer = PdfWriter() + reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf") + + assert writer._get_filtered_outline(node=None, pages={}, reader=reader) == []
Vulnerability mechanics
Root cause
"Absence of cycle detection in linked-list traversals of PDF outline (/First, /Next) and page /Parent chains allows crafted cyclic references to produce an infinite loop."
Attack vector
An attacker crafts a PDF whose outline or page dictionary contains cycles in the `/First`, `/Next`, or `/Parent` pointers. When pypdf's `PdfWriter.merge()` traverses these linked lists without cycle detection, the traversal never terminates, causing an infinite loop that hangs the process [patch_id=6167626]. Preconditions: the attacker must supply a malicious PDF and the victim must call either `writer.merge()` (with the PDF having outlines) or `PageObject._layout_mode_fonts()` on a page whose `/Parent` chain cycles. No authentication is required beyond making the application ingest the malformed file.
What the fix does
The patch adds a `visited` set (tracked by `id(node)`) that is checked before recursing into `/First`, `/Next`, or `/Parent`. If a node is re-encountered, the code logs `"Detected cycle in outlines."` or `"Detected cycle in /Parent hierarchy when retrieving fonts."` and returns / breaks immediately instead of looping forever. Additionally, the patch guards against a `None` node in the outline entry loop by switching from a `while node is not None` condition to an explicit `if "/Next" not in node: break` pattern, avoiding infinite loops when `node.get("/Next", None)` continues to return the same object.
Preconditions
- inputThe application must call PdfWriter.merge() with a PDF that contains outlines, or call PageObject._layout_mode_fonts() on a page whose /Parent chain is cyclic.
- authNo authentication or special privileges required beyond file ingestion.
Generated on Jun 16, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.