VYPR
Medium severityNVD Advisory· Published May 28, 2026· Updated May 28, 2026

CVE-2026-48735

CVE-2026-48735

Description

pypdf is a free and open-source pure-python PDF library. Prior to 6.12.1, an attacker who uses this vulnerability can craft a PDF which leads to large memory usage. This requires parsing large XMP metadata, possibly with lots of unnecessary elements. This vulnerability is fixed in 6.12.1.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

pypdf prior to 6.12.1 is vulnerable to excessive memory consumption when parsing crafted XMP metadata, leading to denial of service.

Vulnerability

pypdf is a free and open-source pure-Python PDF library. Prior to version 6.12.1, the library does not limit the input size or element count when parsing XMP metadata streams. An attacker can craft a PDF containing a large XMP metadata block with many unnecessary elements, causing pypdf to allocate excessive memory during parsing. All versions before 6.12.1 are affected. [1][2][3]

Exploitation

An attacker only needs to provide a malicious PDF file to be processed by pypdf; no authentication or special privileges are required if the PDF originates from an untrusted source. The attacker constructs a PDF with an oversized XMP metadata stream, possibly including numerous redundant elements. When pypdf parses this metadata, it allocates memory proportional to the size and number of elements, leading to high memory consumption. [2][3]

Impact

Successful exploitation results in a denial-of-service condition via memory exhaustion. The application using pypdf may crash or become unresponsive. No code execution, data disclosure, or privilege escalation is described in the available references. [3]

Mitigation

The vulnerability is fixed in pypdf version 6.12.1, released on 2026-05-22. [1] Users should upgrade to this version or later. If upgrading is not immediately possible, applying the changes from pull request #3796 serves as a workaround. [2][3] No known exploitation in the wild or inclusion in CISA's Known Exploited Vulnerabilities catalog has been reported.

AI Insight generated on May 28, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

2
  • Py Pdf/Pypdfreferences2 versions
    (expand)+ 1 more
    • (no CPE)
    • (no CPE)range: <6.12.1

Patches

1
62191d5a5c3e

SEC: Limit input size and element count for XMP metadata (#3796)

https://github.com/py-pdf/pypdfStefanMay 22, 2026via body-scan-shorthand
3 files changed · +57 2
  • docs/user/security.md+7 0 modified
    @@ -50,6 +50,13 @@ For *PdfWriter* instances, the following limits are employed for incremental rea
     * `incremental_clone_object_id_limit` limits the maximum object ID to read during cloning. It defaults to
       1 000 000. Setting it to `None` will fully disable this limit.
     
    +### XMP
    +
    +For reading the XML-based XMP metadata, the following limits apply:
    +
    +* `pypdf.xmp.XMP_MAX_INPUT_LENGTH` for the maximum stream length, defaulting to 5 MB.
    +* `pypdf.xmp.XMP_MAX_ELEMENT_COUNT` for the maximum number of elements, defaulting to 100 000.
    +
     ## Reporting possible vulnerabilities
     
     Please refer to our [security policy](https://github.com/py-pdf/pypdf/security/policy).
    
  • pypdf/xmp.py+18 1 modified
    @@ -19,13 +19,17 @@
     from xml.dom.expatbuilder import ExpatBuilderNS
     from xml.dom.minidom import Document
     from xml.dom.minidom import Element as XmlElement
    +from xml.dom.xmlbuilder import Options
     from xml.parsers.expat import ExpatError, XMLParserType
     
     from ._protocols import XmpInformationProtocol
     from ._utils import StreamType, deprecate_with_replacement, deprecation_no_replacement
    -from .errors import PdfReadError, XmpDocumentError
    +from .errors import LimitReachedError, PdfReadError, XmpDocumentError
     from .generic import ContentStream, PdfObject
     
    +XMP_MAX_INPUT_LENGTH = 5_000_000
    +XMP_MAX_ELEMENT_COUNT = 100_000
    +
     RDF_NAMESPACE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     DC_NAMESPACE = "http://purl.org/dc/elements/1.1/"
     XMP_NAMESPACE = "http://ns.adobe.com/xap/1.0/"
    @@ -173,6 +177,10 @@ class _XmpBuilder(ExpatBuilderNS):
         but not cases like quadratic entity expansion which can still cause quite some memory usage.
         """
     
    +    def __init__(self, options: Optional[Options] = None) -> None:
    +        super().__init__(options=options)
    +        self._element_count = 0
    +
         def custom_entity_declaration_handler(
                 self,
                 entity_name: str,
    @@ -185,10 +193,17 @@ def custom_entity_declaration_handler(
         ) -> None:
             raise ExpatError(f"Forbidden entities: {entity_name!r}")
     
    +    def start_element_handler(self, name: str, attributes: list[str]) -> None:
    +        self._element_count += 1
    +        if self._element_count > XMP_MAX_ELEMENT_COUNT:
    +            raise LimitReachedError(f"XMP metadata exceeds limit of {XMP_MAX_ELEMENT_COUNT} elements.")
    +        super().start_element_handler(name=name, attributes=attributes)
    +
         def install(self, parser: XMLParserType) -> None:
             super().install(parser)
     
             parser.EntityDeclHandler = self.custom_entity_declaration_handler
    +        parser.StartElementHandler = self.start_element_handler
     
     
     class XmpInformation(XmpInformationProtocol, PdfObject):
    @@ -205,6 +220,8 @@ def __init__(self, stream: ContentStream) -> None:
             self.stream = stream
             try:
                 data = self.stream.get_data()
    +            if (length := len(data)) > XMP_MAX_INPUT_LENGTH:
    +                raise LimitReachedError(f"XMP stream size {length} exceeds limit of {XMP_MAX_INPUT_LENGTH}.")
                 doc_root: Document = _XmpBuilder().parseString(data)
             except (AttributeError, ExpatError) as e:
                 raise PdfReadError(f"XML in XmpInformation was invalid: {e}")
    
  • tests/test_xmp.py+32 1 modified
    @@ -7,7 +7,7 @@
     import pypdf.generic
     import pypdf.xmp
     from pypdf import PdfReader, PdfWriter
    -from pypdf.errors import PdfReadError, XmpDocumentError
    +from pypdf.errors import LimitReachedError, PdfReadError, XmpDocumentError
     from pypdf.generic import ContentStream, NameObject, StreamObject
     from pypdf.xmp import XmpInformation
     
    @@ -963,3 +963,34 @@ def test_xmp_information__quadratic_entity_expansion():
                 match=r"^XML in XmpInformation was invalid: Forbidden entities: 'a'$"
         ):
             XmpInformation(stream)
    +
    +
    +@pytest.mark.timeout(10)
    +def test_xmp_information__input_limit():
    +    stream = ContentStream(pdf=None, stream=None)
    +    stream.set_data(b"A" * 10_000_000)
    +
    +    with pytest.raises(
    +            expected_exception=LimitReachedError,
    +            match=r"^XMP stream size 10000000 exceeds limit of 5000000\.$"
    +    ):
    +        XmpInformation(stream)
    +
    +
    +@pytest.mark.timeout(10)
    +def test_xmp_information__element_limit():
    +    stream = ContentStream(pdf=None, stream=None)
    +
    +    xmp = b'<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>\n'
    +    xmp += b'<x:xmpmeta xmlns:x="adobe:ns:meta/">'
    +    xmp += b'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">'
    +    xmp += b'<rdf:Description rdf:about="" xmlns:custom="urn:custom">'
    +    xmp += b"<custom:a/>" * 100_010
    +    xmp += b"</rdf:Description></rdf:RDF></x:xmpmeta>"
    +    stream.set_data(xmp)
    +
    +    with pytest.raises(
    +            expected_exception=LimitReachedError,
    +            match=r"^XMP metadata exceeds limit of 100000 elements\.$"
    +    ):
    +        XmpInformation(stream)
    

Vulnerability mechanics

Root cause

"Missing input size and element count limits on XMP metadata parsing allow an attacker to craft a PDF with oversized or overly complex XML that consumes excessive memory."

Attack vector

An attacker crafts a PDF containing XMP metadata whose XML stream is either very large (over 5 MB) or contains an extremely high number of XML elements (over 100,000). When pypdf parses this metadata via `XmpInformation`, the XML parser processes the entire payload without any size or element-count guardrails, leading to large memory allocation. The attacker does not need authentication; the vector is a malformed PDF delivered to any application that uses pypdf to read PDF metadata.

Affected code

The vulnerable code is in `pypdf/xmp.py`, specifically the `XmpInformation.__init__` method which reads the XMP stream without size checks, and the `_XmpBuilder` class which lacked an element-count limit during XML parsing [patch_id=2955870].

What the fix does

The patch introduces two hard limits in `pypdf/xmp.py`: `XMP_MAX_INPUT_LENGTH` (5 MB) and `XMP_MAX_ELEMENT_COUNT` (100,000). Before parsing, the stream length is checked against the input limit. A custom `start_element_handler` in `_XmpBuilder` increments a counter on each XML element and raises `LimitReachedError` if the element count exceeds the threshold. These changes prevent both large-stream and high-element-count denial-of-service attacks [patch_id=2955870].

Preconditions

  • configThe target application must use pypdf to parse XMP metadata from a PDF.
  • inputThe attacker must be able to supply a crafted PDF file to the application.

Generated on May 28, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

3

News mentions

0

No linked articles in our index yet.