CVE-2026-48735
Description
pypdf is a free and open-source pure-python PDF library. Prior to 6.12.1, an attacker who uses this vulnerability can craft a PDF which leads to large memory usage. This requires parsing large XMP metadata, possibly with lots of unnecessary elements. This vulnerability is fixed in 6.12.1.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
pypdf prior to 6.12.1 is vulnerable to excessive memory consumption when parsing crafted XMP metadata, leading to denial of service.
Vulnerability
pypdf is a free and open-source pure-Python PDF library. Prior to version 6.12.1, the library does not limit the input size or element count when parsing XMP metadata streams. An attacker can craft a PDF containing a large XMP metadata block with many unnecessary elements, causing pypdf to allocate excessive memory during parsing. All versions before 6.12.1 are affected. [1][2][3]
Exploitation
An attacker only needs to provide a malicious PDF file to be processed by pypdf; no authentication or special privileges are required if the PDF originates from an untrusted source. The attacker constructs a PDF with an oversized XMP metadata stream, possibly including numerous redundant elements. When pypdf parses this metadata, it allocates memory proportional to the size and number of elements, leading to high memory consumption. [2][3]
Impact
Successful exploitation results in a denial-of-service condition via memory exhaustion. The application using pypdf may crash or become unresponsive. No code execution, data disclosure, or privilege escalation is described in the available references. [3]
Mitigation
The vulnerability is fixed in pypdf version 6.12.1, released on 2026-05-22. [1] Users should upgrade to this version or later. If upgrading is not immediately possible, applying the changes from pull request #3796 serves as a workaround. [2][3] No known exploitation in the wild or inclusion in CISA's Known Exploited Vulnerabilities catalog has been reported.
AI Insight generated on May 28, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2Patches
162191d5a5c3eSEC: Limit input size and element count for XMP metadata (#3796)
3 files changed · +57 −2
docs/user/security.md+7 −0 modified@@ -50,6 +50,13 @@ For *PdfWriter* instances, the following limits are employed for incremental rea * `incremental_clone_object_id_limit` limits the maximum object ID to read during cloning. It defaults to 1 000 000. Setting it to `None` will fully disable this limit. +### XMP + +For reading the XML-based XMP metadata, the following limits apply: + +* `pypdf.xmp.XMP_MAX_INPUT_LENGTH` for the maximum stream length, defaulting to 5 MB. +* `pypdf.xmp.XMP_MAX_ELEMENT_COUNT` for the maximum number of elements, defaulting to 100 000. + ## Reporting possible vulnerabilities Please refer to our [security policy](https://github.com/py-pdf/pypdf/security/policy).
pypdf/xmp.py+18 −1 modified@@ -19,13 +19,17 @@ from xml.dom.expatbuilder import ExpatBuilderNS from xml.dom.minidom import Document from xml.dom.minidom import Element as XmlElement +from xml.dom.xmlbuilder import Options from xml.parsers.expat import ExpatError, XMLParserType from ._protocols import XmpInformationProtocol from ._utils import StreamType, deprecate_with_replacement, deprecation_no_replacement -from .errors import PdfReadError, XmpDocumentError +from .errors import LimitReachedError, PdfReadError, XmpDocumentError from .generic import ContentStream, PdfObject +XMP_MAX_INPUT_LENGTH = 5_000_000 +XMP_MAX_ELEMENT_COUNT = 100_000 + RDF_NAMESPACE = "http://www.w3.org/1999/02/22-rdf-syntax-ns#" DC_NAMESPACE = "http://purl.org/dc/elements/1.1/" XMP_NAMESPACE = "http://ns.adobe.com/xap/1.0/" @@ -173,6 +177,10 @@ class _XmpBuilder(ExpatBuilderNS): but not cases like quadratic entity expansion which can still cause quite some memory usage. """ + def __init__(self, options: Optional[Options] = None) -> None: + super().__init__(options=options) + self._element_count = 0 + def custom_entity_declaration_handler( self, entity_name: str, @@ -185,10 +193,17 @@ def custom_entity_declaration_handler( ) -> None: raise ExpatError(f"Forbidden entities: {entity_name!r}") + def start_element_handler(self, name: str, attributes: list[str]) -> None: + self._element_count += 1 + if self._element_count > XMP_MAX_ELEMENT_COUNT: + raise LimitReachedError(f"XMP metadata exceeds limit of {XMP_MAX_ELEMENT_COUNT} elements.") + super().start_element_handler(name=name, attributes=attributes) + def install(self, parser: XMLParserType) -> None: super().install(parser) parser.EntityDeclHandler = self.custom_entity_declaration_handler + parser.StartElementHandler = self.start_element_handler class XmpInformation(XmpInformationProtocol, PdfObject): @@ -205,6 +220,8 @@ def __init__(self, stream: ContentStream) -> None: self.stream = stream try: data = self.stream.get_data() + if (length := len(data)) > XMP_MAX_INPUT_LENGTH: + raise LimitReachedError(f"XMP stream size {length} exceeds limit of {XMP_MAX_INPUT_LENGTH}.") doc_root: Document = _XmpBuilder().parseString(data) except (AttributeError, ExpatError) as e: raise PdfReadError(f"XML in XmpInformation was invalid: {e}")
tests/test_xmp.py+32 −1 modified@@ -7,7 +7,7 @@ import pypdf.generic import pypdf.xmp from pypdf import PdfReader, PdfWriter -from pypdf.errors import PdfReadError, XmpDocumentError +from pypdf.errors import LimitReachedError, PdfReadError, XmpDocumentError from pypdf.generic import ContentStream, NameObject, StreamObject from pypdf.xmp import XmpInformation @@ -963,3 +963,34 @@ def test_xmp_information__quadratic_entity_expansion(): match=r"^XML in XmpInformation was invalid: Forbidden entities: 'a'$" ): XmpInformation(stream) + + +@pytest.mark.timeout(10) +def test_xmp_information__input_limit(): + stream = ContentStream(pdf=None, stream=None) + stream.set_data(b"A" * 10_000_000) + + with pytest.raises( + expected_exception=LimitReachedError, + match=r"^XMP stream size 10000000 exceeds limit of 5000000\.$" + ): + XmpInformation(stream) + + +@pytest.mark.timeout(10) +def test_xmp_information__element_limit(): + stream = ContentStream(pdf=None, stream=None) + + xmp = b'<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>\n' + xmp += b'<x:xmpmeta xmlns:x="adobe:ns:meta/">' + xmp += b'<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">' + xmp += b'<rdf:Description rdf:about="" xmlns:custom="urn:custom">' + xmp += b"<custom:a/>" * 100_010 + xmp += b"</rdf:Description></rdf:RDF></x:xmpmeta>" + stream.set_data(xmp) + + with pytest.raises( + expected_exception=LimitReachedError, + match=r"^XMP metadata exceeds limit of 100000 elements\.$" + ): + XmpInformation(stream)
Vulnerability mechanics
Root cause
"Missing input size and element count limits on XMP metadata parsing allow an attacker to craft a PDF with oversized or overly complex XML that consumes excessive memory."
Attack vector
An attacker crafts a PDF containing XMP metadata whose XML stream is either very large (over 5 MB) or contains an extremely high number of XML elements (over 100,000). When pypdf parses this metadata via `XmpInformation`, the XML parser processes the entire payload without any size or element-count guardrails, leading to large memory allocation. The attacker does not need authentication; the vector is a malformed PDF delivered to any application that uses pypdf to read PDF metadata.
Affected code
The vulnerable code is in `pypdf/xmp.py`, specifically the `XmpInformation.__init__` method which reads the XMP stream without size checks, and the `_XmpBuilder` class which lacked an element-count limit during XML parsing [patch_id=2955870].
What the fix does
The patch introduces two hard limits in `pypdf/xmp.py`: `XMP_MAX_INPUT_LENGTH` (5 MB) and `XMP_MAX_ELEMENT_COUNT` (100,000). Before parsing, the stream length is checked against the input limit. A custom `start_element_handler` in `_XmpBuilder` increments a counter on each XML element and raises `LimitReachedError` if the element count exceeds the threshold. These changes prevent both large-stream and high-element-count denial-of-service attacks [patch_id=2955870].
Preconditions
- configThe target application must use pypdf to parse XMP metadata from a PDF.
- inputThe attacker must be able to supply a crafted PDF file to the application.
Generated on May 28, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
3News mentions
0No linked articles in our index yet.