VYPR
Medium severity6.9GHSA Advisory· Published Jun 16, 2026· Updated Jun 16, 2026

pypdf: Possible infinite loop when retrieving fonts for layout-mode text extraction

CVE-2026-54530

Description

Crafted PDF causes infinite loop in pypdf when extracting text in layout mode, fixed in version 6.13.0.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Crafted PDF causes infinite loop in pypdf when extracting text in layout mode, fixed in version 6.13.0.

Vulnerability

A vulnerability in pypdf prior to version 6.13.0 allows an attacker to craft a PDF that triggers an infinite loop during text extraction in layout mode. The loop occurs when the library retrieves font information from the PDF, leading to uncontrolled resource consumption. [1][3]

Exploitation

An attacker can exploit this by providing a malicious PDF to any application or service that uses pypdf to extract text in layout mode. No authentication or special privileges are required; the victim only needs to process the crafted PDF. The infinite loop is triggered during the font retrieval step, as detailed in the fix commit. [1][2]

Impact

Successful exploitation results in a denial of service (DoS) due to an infinite loop, causing the application to hang or exhaust CPU resources. No data disclosure, privilege escalation, or remote code execution has been reported. [1]

Mitigation

The issue is fixed in pypdf version 6.13.0, released on 2026-06-05. Users unable to upgrade immediately can apply the changes from pull request #3830 as a workaround. [2][3]

AI Insight generated on Jun 16, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

2

Patches

1
68822ded066f

SEC: Avoid infinite loops for outlines and text extraction (#3830)

https://github.com/py-pdf/pypdfStefanJun 5, 2026via body-scan-shorthand
4 files changed · +158 33
  • pypdf/_page.py+14 10 modified
    @@ -1864,20 +1864,24 @@ def _layout_mode_fonts(self) -> dict[str, Font]:
     
             """
             # Font retrieval logic adapted from pypdf.PageObject._extract_text()
    -        objr: Any = self
    +        obj: Any = self
             fonts: dict[str, Font] = {}
    -        while objr is not None:
    -            try:
    -                resources_dict: Any = objr[PG.RESOURCES]
    -            except KeyError:
    -                resources_dict = {}
    +        visited: set[int] = set()
    +        while True:
    +            obj_id = id(obj)
    +            if obj_id in visited:
    +                logger_warning("Detected cycle in /Parent hierarchy when retrieving fonts.", source=__name__)
    +                break
    +            visited.add(obj_id)
    +
    +            resources_dict: Any = obj.get(PG.RESOURCES, {})
                 if "/Font" in resources_dict and self.pdf is not None:
                     for font_name in resources_dict["/Font"]:
                         fonts[font_name] = Font.from_font_resource(resources_dict["/Font"][font_name])
    -            try:
    -                objr = objr["/Parent"].get_object()
    -            except KeyError:
    -                objr = None
    +
    +            if "/Parent" not in obj:
    +                break
    +            obj = obj["/Parent"].get_object()
     
             return fonts
     
    
  • pypdf/_writer.py+38 23 modified
    @@ -2788,7 +2788,7 @@ def merge(
             _ro = reader.root_object
             if import_outline and CO.OUTLINES in _ro:
                 outline = self._get_filtered_outline(
    -                _ro.get(CO.OUTLINES, None), srcpages, reader
    +                node=_ro.get(CO.OUTLINES, None), pages=srcpages, reader=reader
                 )
                 self._insert_filtered_outline(
                     outline, outline_item_typ, None
    @@ -3053,54 +3053,69 @@ def _insert_filtered_annotations(
     
         def _get_filtered_outline(
             self,
    +        *,
             node: Any,
             pages: dict[int, PageObject],
             reader: PdfReader,
    +        visited: Optional[set[int]] = None,
         ) -> list[Destination]:
             """
             Extract outline item entries that are part of the specified page set.
     
    -        Args:
    -            node:
    -            pages:
    -            reader:
    -
             Returns:
                 A list of destination objects.
     
             """
    -        new_outline = []
    +        if visited is None:
    +            visited = set()
    +        new_outline: list[Destination] = []
             if node is None:
    -            node = NullObject()
    +            return new_outline
             node = node.get_object()
             if is_null_or_none(node):
                 node = DictionaryObject()
    +
             if node.get("/Type", "") == "/Outlines" or "/Title" not in node:
    +            node_id = id(node)
    +            if node_id in visited:
    +                logger_warning("Detected cycle in outlines.", source=__name__)
    +                return []
    +            visited.add(node_id)
    +
                 node = node.get("/First", None)
                 if node is not None:
                     node = node.get_object()
    -                new_outline += self._get_filtered_outline(node, pages, reader)
    +                new_outline += self._get_filtered_outline(node=node, pages=pages, reader=reader, visited=visited)
             else:
    -            v: Union[None, IndirectObject, NullObject]
    -            while node is not None:
    +            cloned_page: Union[None, IndirectObject, NullObject]
    +            while True:
                     node = node.get_object()
    -                o = cast("Destination", reader._build_outline_item(node))
    -                v = self._get_cloned_page(cast("PageObject", o["/Page"]), pages, reader)
    -                if v is None:
    -                    v = NullObject()
    -                o[NameObject("/Page")] = v
    +                node_id = id(node)
    +                if node_id in visited:
    +                    logger_warning("Detected cycle in outlines.", source=__name__)
    +                    break
    +                visited.add(node_id)
    +
    +                destination = cast("Destination", reader._build_outline_item(node))
    +                cloned_page = self._get_cloned_page(cast("PageObject", destination["/Page"]), pages, reader)
    +                if cloned_page is None:
    +                    cloned_page = NullObject()
    +                destination[NameObject("/Page")] = cloned_page
                     if "/First" in node:
    -                    o._filtered_children = self._get_filtered_outline(
    -                        node["/First"], pages, reader
    +                    destination._filtered_children = self._get_filtered_outline(
    +                        node=node["/First"], pages=pages, reader=reader, visited=visited
                         )
                     else:
    -                    o._filtered_children = []
    +                    destination._filtered_children = []
                     if (
    -                    not isinstance(o["/Page"], NullObject)
    -                    or len(o._filtered_children) > 0
    +                    not isinstance(cloned_page, NullObject)
    +                    or len(destination._filtered_children) > 0
                     ):
    -                    new_outline.append(o)
    -                node = node.get("/Next", None)
    +                    new_outline.append(destination)
    +
    +                if "/Next" not in node:
    +                    break
    +                node = node["/Next"]
             return new_outline
     
         def _clone_outline(self, dest: Destination) -> TreeObject:
    
  • tests/test_text_extraction.py+29 0 modified
    @@ -649,3 +649,32 @@ def test_text_state_params__unicode_decode_error(encoding):
         # Assertions: 'replace' mode changes invalid UTF-8 bytes to '\xfffd'.
         assert parameters.text == "\ufffd"
         assert parameters._decoded_value == "\ufffd"
    +
    +
    +@pytest.mark.timeout(5)
    +def test_page_object__layout_mode_fonts__cyclic(caplog) -> None:
    +    writer = PdfWriter()
    +
    +    font = DictionaryObject({
    +        NameObject("/Type"): NameObject("/Font"),
    +        NameObject("/Subtype"): NameObject("/Type1"),
    +        NameObject("/BaseFont"): NameObject("/Helvetica"),
    +    })
    +    fonts = {"/F1": Font.from_font_resource(font)}
    +    page = writer.add_blank_page(width=10, height=10)
    +    dictionary2 = DictionaryObject(DictionaryObject({
    +        NameObject("/Resources"): DictionaryObject({
    +            NameObject("/Font"): DictionaryObject({
    +                NameObject("/F1"): font
    +            })
    +        })
    +    }))
    +    reference2 = writer._add_object(dictionary2)
    +    dictionary3 = DictionaryObject({NameObject("/Parent"): reference2})
    +    reference3 = writer._add_object(dictionary3)
    +    page[NameObject("/Parent")] = reference3
    +    dictionary2[NameObject("/Parent")] = page.indirect_reference
    +    page.pdf = writer
    +
    +    assert page._layout_mode_fonts() == fonts
    +    assert caplog.messages == ["Detected cycle in /Parent hierarchy when retrieving fonts."]
    
  • tests/test_writer.py+77 0 modified
    @@ -3218,3 +3218,80 @@ def test_encrypt__incremental():
     
         with pytest.raises(NotImplementedError):
             writer.encrypt(user_password="dummy")
    +
    +
    +@pytest.mark.timeout(5)
    +def test_get_filtered_outline__first__cyclic(caplog) -> None:
    +    writer = PdfWriter()
    +    reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf")
    +
    +    dictionary1 = DictionaryObject({
    +        NameObject("/Type"): NameObject("/Outlines")
    +    })
    +    reference1 = writer._add_object(dictionary1)
    +    dictionary2 = DictionaryObject({
    +        NameObject("/Type"): NameObject("/Outlines")
    +    })
    +    reference2 = writer._add_object(dictionary2)
    +    dictionary3 = DictionaryObject({
    +        NameObject("/First"): reference2,
    +        NameObject("/Type"): NameObject("/Outlines")
    +    })
    +    reference3 = writer._add_object(dictionary3)
    +    dictionary1[NameObject("/First")] = reference3
    +    dictionary2[NameObject("/First")] = reference1
    +
    +    assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == []
    +    assert caplog.messages == ["Detected cycle in outlines."]
    +
    +
    +@pytest.mark.timeout(5)
    +def test_get_filtered_outline__next_first__cyclic(caplog) -> None:
    +    writer = PdfWriter()
    +    reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf")
    +
    +    dictionary1 = DictionaryObject({
    +        NameObject("/Title"): TextStringObject("test")
    +    })
    +    _reference1 = writer._add_object(dictionary1)
    +    dictionary2 = DictionaryObject({
    +        NameObject("/Type"): NameObject("/Outlines")
    +    })
    +    reference2 = writer._add_object(dictionary2)
    +    dictionary1[NameObject("/Next")] = reference2
    +    dictionary2[NameObject("/First")] = reference2
    +
    +    assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == []
    +    assert caplog.messages == ["Detected cycle in outlines."]
    +
    +
    +@pytest.mark.timeout(5)
    +def test_get_filtered_outline__next_next__cyclic(caplog) -> None:
    +    writer = PdfWriter()
    +    reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf")
    +
    +    dictionary1 = DictionaryObject({
    +        NameObject("/Title"): TextStringObject("test")
    +    })
    +    reference1 = writer._add_object(dictionary1)
    +    dictionary2 = DictionaryObject({
    +        NameObject("/Title"): TextStringObject("test")
    +    })
    +    reference2 = writer._add_object(dictionary2)
    +    dictionary3 = DictionaryObject({
    +        NameObject("/Next"): reference2,
    +        NameObject("/Title"): TextStringObject("test")
    +    })
    +    reference3 = writer._add_object(dictionary3)
    +    dictionary1[NameObject("/Next")] = reference3
    +    dictionary2[NameObject("/Next")] = reference1
    +
    +    assert writer._get_filtered_outline(node=dictionary1, pages={}, reader=reader) == []
    +    assert caplog.messages == ["Detected cycle in outlines."]
    +
    +
    +def test_get_filtered_outline__node_is_none() -> None:
    +    writer = PdfWriter()
    +    reader = PdfReader(RESOURCE_ROOT / "crazyones.pdf")
    +
    +    assert writer._get_filtered_outline(node=None, pages={}, reader=reader) == []
    

Vulnerability mechanics

Root cause

"pypdf traversed PDF outline linked lists and page /Parent chains without tracking previously visited nodes, leading to an infinite loop on malformed cyclic input."

Attack vector

An attacker crafts a malicious PDF whose outline tree or page-object /Parent chain contains a cycle (e.g., `/First` points back to an earlier outline node, or `/Parent` of a page leads to another page that eventually points back). When `pypdf` extracts text in layout mode — or merges outlines — the library follows the linked-list structure without tracking visited nodes, causing an infinite loop. The attacker does not need authentication; the payload is delivered through any channel that feeds a PDF to the library (email, upload, etc.).

What the fix does

The patch introduces a `visited: set[int]` parameter in `_get_filtered_outline` and a similar set in `_layout_mode_fonts`. Before following a node’s `/First`, `/Next`, or `/Parent` link, the code records `id(node)` and checks membership; if the ID is already in the set, it logs a warning (`"Detected cycle in outlines."` or `"Detected cycle in /Parent hierarchy when retrieving fonts."`) and returns or breaks, preventing unbounded recursion. This closes both the outline-traversal and page-font-traversal infinite loops without altering the library’s external API.

Preconditions

  • inputThe application uses pypdf to open a user-supplied PDF and calls text extraction in layout mode or performs a merge that copies outlines.

Generated on Jun 16, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

4

News mentions

0

No linked articles in our index yet.