Docling: Unsafe XML Entity Expansion in USPTO Patent Backend
Description
Impact
The USPTO patent XML parser used the standard xml.sax.parseString() without protection against XML External Entity (XXE) attacks. An attacker could craft malicious USPTO patent XML files with external entity references that could: - Read arbitrary files from the server filesystem - Perform Server-Side Request Forgery (SSRF) attacks - Cause denial of service through entity expansion (Billion Laughs attack)
The vulnerability affects three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x.
Patches
Fixed in version 2.74.0. The parser now uses defusedxml.sax.make_parser() with secure configuration that blocks external entity resolution (feature_external_ges=False, feature_external_pes=False) while allowing DTD declarations required by USPTO files. This prevents XXE attacks while maintaining compatibility with the USPTO XML format.
Workarounds
Avoid processing USPTO patent XML files from untrusted sources. Implement resource limits (memory, CPU time) when processing patent documents.
### References - Fix release: v2.74.0
Affected products
2- Range: <2.74.0
Patches
1576bada7b7d5fix: security vulnerabilities with XML External Entity and related attacks (#3009)
4 files changed · +177 −70
docling/backend/xml/jats_backend.py+48 −23 modified@@ -1,8 +1,27 @@ +"""Backend to parse articles in JATS (Journal Article Tag Suite) XML format. + +JATS is a standard XML format used by publishers and journal archives including +PubMed Central (PMC), bioRxiv, and medRxiv for representing journal articles. + +Security Note: + This module uses lxml.etree.XMLParser with secure configuration to protect + against XML External Entity (XXE) attacks and XML bombs. The parser is + configured with: + + - resolve_entities: False (prevents entity resolution attacks) + - no_network: True (blocks all network access) + - dtd_validation: False (disables DTD validation) + - load_dtd: False (prevents loading external DTDs) + + This configuration ensures safe parsing of JATS XML files while blocking + external entity fetching and preventing XXE attacks. +""" + import logging import traceback from io import BytesIO from pathlib import Path -from typing import Final, Optional, Union, cast +from typing import Final, cast from bs4 import BeautifulSoup, NavigableString, Tag from docling_core.types.doc import ( @@ -26,11 +45,11 @@ _log = logging.getLogger(__name__) -JATS_DTD_URL: Final = ["JATS-journalpublishing", "JATS-archive"] -DEFAULT_HEADER_ACKNOWLEDGMENTS: Final = "Acknowledgments" -DEFAULT_HEADER_ABSTRACT: Final = "Abstract" -DEFAULT_HEADER_REFERENCES: Final = "References" -DEFAULT_TEXT_ETAL: Final = "et al." +JATS_DTD_URL: Final[list[str]] = ["JATS-journalpublishing", "JATS-archive"] +DEFAULT_HEADER_ACKNOWLEDGMENTS: Final[str] = "Acknowledgments" +DEFAULT_HEADER_ABSTRACT: Final[str] = "Abstract" +DEFAULT_HEADER_REFERENCES: Final[str] = "References" +DEFAULT_TEXT_ETAL: Final[str] = "et al." class Abstract(TypedDict): @@ -87,20 +106,26 @@ class JatsDocumentBackend(DeclarativeDocumentBackend): """ @override - def __init__( - self, in_doc: "InputDocument", path_or_stream: Union[BytesIO, Path] - ) -> None: + def __init__(self, in_doc: "InputDocument", path_or_stream: BytesIO | Path) -> None: super().__init__(in_doc, path_or_stream) self.path_or_stream = path_or_stream # Initialize the root of the document hierarchy - self.root: Optional[NodeItem] = None + self.root: NodeItem | None = None self.hlevel: int = 0 self.valid: bool = False try: if isinstance(self.path_or_stream, BytesIO): self.path_or_stream.seek(0) - self.tree: etree._ElementTree = etree.parse(self.path_or_stream) + parser = etree.XMLParser( + resolve_entities=False, + load_dtd=False, + no_network=True, + dtd_validation=False, + ) + self.tree: etree._ElementTree = etree.parse( + self.path_or_stream, parser=parser + ) doc_info: etree.DocInfo = self.tree.docinfo if doc_info.system_url and any( @@ -172,7 +197,7 @@ def convert(self) -> DoclingDocument: return doc @staticmethod - def _get_text(node: etree._Element, sep: Optional[str] = None) -> str: + def _get_text(node: etree._Element, sep: str | None = None) -> str: skip_tags = ["term", "disp-formula", "inline-formula"] text: str = ( node.text.replace("\n", " ") @@ -189,9 +214,9 @@ def _get_text(node: etree._Element, sep: Optional[str] = None) -> str: return text - def _find_metadata(self) -> Optional[etree._Element]: + def _find_metadata(self) -> etree._Element | None: meta_names: list[str] = ["article-meta", "book-part-meta"] - meta: Optional[etree._Element] = None + meta: etree._Element | None = None for name in meta_names: node = self.tree.xpath(f".//{name}") if len(node) > 0: @@ -222,7 +247,7 @@ def _parse_abstract(self) -> list[Abstract]: def _parse_authors(self) -> list[Author]: # Get mapping between affiliation ids and names authors: list[Author] = [] - meta: Optional[etree._Element] = self._find_metadata() + meta: etree._Element | None = self._find_metadata() if meta is None: return authors @@ -390,7 +415,7 @@ def _parse_element_citation(self, node: etree._Element) -> str: "part-title", "trans-title", ] - title_node: Optional[etree._Element] = None + title_node: etree._Element | None = None for name in titles: name_node = node.xpath(name) if len(name_node) > 0: @@ -493,12 +518,12 @@ def _add_figure_captions( self, doc: DoclingDocument, parent: NodeItem, node: etree._Element ) -> None: label_node = node.xpath("label") - label: Optional[str] = ( + label: str | None = ( JatsDocumentBackend._get_text(label_node[0]).strip() if label_node else "" ) caption_node = node.xpath("caption") - caption: Optional[str] + caption: str | None if len(caption_node) > 0: caption = "" for caption_par in list(caption_node[0]): @@ -511,7 +536,7 @@ def _add_figure_captions( # TODO: format label vs caption once styling is supported fig_text: str = f"{label}{' ' if label and caption else ''}{caption}" - fig_caption: Optional[TextItem] = ( + fig_caption: TextItem | None = ( doc.add_text(label=DocItemLabel.CAPTION, text=fig_text) if fig_text else None @@ -538,7 +563,7 @@ def _add_metadata( return @staticmethod - def parse_table_data(element: Tag) -> Optional[TableData]: + def parse_table_data(element: Tag) -> TableData | None: # TODO, see how to implement proper support for rich tables from HTML backend nested_tables = element.find("table") if nested_tables is not None: @@ -654,7 +679,7 @@ def _add_table( label = table_xml_component["label"] caption = table_xml_component["caption"] table_text: str = f"{label}{' ' if label and caption else ''}{caption}" - table_caption: Optional[TextItem] = ( + table_caption: TextItem | None = ( doc.add_text(label=DocItemLabel.CAPTION, text=table_text) if table_text else None @@ -681,7 +706,7 @@ def _add_tables( # Caption caption_node = node.xpath("caption") - caption: Optional[str] + caption: str | None if caption_node: caption = "" for caption_par in list(caption_node[0]): @@ -738,7 +763,7 @@ def _walk_linear( # add elements and decide whether to stop walking if child.tag in ("sec", "ack"): header = child.xpath("title|label") - text: Optional[str] = None + text: str | None = None if len(header) > 0: text = JatsDocumentBackend._get_text(header[0]) elif child.tag == "ack":
docling/backend/xml/uspto_backend.py+114 −47 modified@@ -3,20 +3,41 @@ The parsers included in this module can handle patent grants published since 1976 and patent applications since 2001. The original files can be found in https://bulkdata.uspto.gov. + +Security Note: + This module uses defusedxml.sax.make_parser() with customized security settings + to protect against XML External Entity (XXE) attacks while allowing USPTO XML files + to be parsed. In addition, it includes safeguards against entity expansion attacks + and entity nesting depth. USPTO files contain DTD declarations that defusedxml + blocks by default, so we configure the parser with: + + - feature_external_ges: False (blocks external general entities) + - feature_external_pes: False (blocks external parameter entities) + - forbid_dtd: False (allows DTD declarations in the XML) + - forbid_entities: False (allows entity declarations) + - forbid_external: False (allows external references in declarations) + + This configuration permits DTD declarations (required for USPTO files) while the + disabled external entity features prevent actual fetching of external resources, + effectively blocking XXE attacks. The parser processes the XML structure without + accessing any external files or URLs. """ import html import logging import re -import xml.sax -import xml.sax.xmlreader from abc import ABC, abstractmethod from enum import Enum, unique -from io import BytesIO +from io import BytesIO, StringIO from pathlib import Path -from typing import Final, Optional, Union +from typing import Final +from xml.sax import SAXParseException +from xml.sax.handler import ContentHandler, feature_external_ges, feature_external_pes +from xml.sax.xmlreader import AttributesImpl from bs4 import BeautifulSoup, Tag +from defusedxml.common import DefusedXmlException +from defusedxml.sax import make_parser from docling_core.types.doc import ( DocItem, DocItemLabel, @@ -36,7 +57,7 @@ _log = logging.getLogger(__name__) -XML_DECLARATION: Final = '<?xml version="1.0" encoding="UTF-8"?>' +XML_DECLARATION: Final[str] = '<?xml version="1.0" encoding="UTF-8"?>' @unique @@ -59,13 +80,11 @@ def __init__(self, _, level: LevelNumber) -> None: class PatentUsptoDocumentBackend(DeclarativeDocumentBackend): @override - def __init__( - self, in_doc: InputDocument, path_or_stream: Union[BytesIO, Path] - ) -> None: + def __init__(self, in_doc: InputDocument, path_or_stream: BytesIO | Path) -> None: super().__init__(in_doc, path_or_stream) self.patent_content: str = "" - self.parser: Optional[PatentUspto] = None + self.parser: PatentUspto | None = None try: if isinstance(self.path_or_stream, BytesIO): @@ -153,7 +172,7 @@ class PatentUspto(ABC): """Parser of patent documents from the US Patent Office.""" @abstractmethod - def parse(self, patent_content: str) -> Optional[DoclingDocument]: + def parse(self, patent_content: str) -> DoclingDocument | None: """Parse a USPTO patent. Parameters: @@ -177,12 +196,26 @@ def __init__(self) -> None: self.handler = PatentUsptoIce.PatentHandler() self.pattern = re.compile(r"^(<table .*?</table>)", re.MULTILINE | re.DOTALL) - def parse(self, patent_content: str) -> Optional[DoclingDocument]: + def parse(self, patent_content: str) -> DoclingDocument | None: try: - xml.sax.parseString(patent_content, self.handler) - except xml.sax._exceptions.SAXParseException as exc_sax: - _log.error(f"Error in parsing USPTO document: {exc_sax}") - + parser = make_parser() + parser.setFeature(feature_external_ges, False) + parser.setFeature(feature_external_pes, False) + parser.forbid_dtd = False + parser.forbid_entities = False + parser.forbid_external = False + parser.setContentHandler(self.handler) + parser.parse(StringIO(patent_content)) + except SAXParseException as exc_sax: + _log.error(f"Error in parsing USPTO document (malformed XML): {exc_sax}") + return None + except DefusedXmlException as exc_defused: + _log.error( + f"Error in parsing USPTO document (security issue detected): {exc_defused}" + ) + return None + except Exception as exc: + _log.error(f"Unexpected error in parsing USPTO document: {exc}") return None doc = self.handler.doc @@ -209,11 +242,11 @@ def parse(self, patent_content: str) -> Optional[DoclingDocument]: return doc - class PatentHandler(xml.sax.handler.ContentHandler): + class PatentHandler(ContentHandler): """SAX ContentHandler for patent documents.""" - APP_DOC_ELEMENT: Final = "us-patent-application" - GRANT_DOC_ELEMENT: Final = "us-patent-grant" + APP_DOC_ELEMENT: Final[str] = "us-patent-application" + GRANT_DOC_ELEMENT: Final[str] = "us-patent-grant" @unique class Element(Enum): @@ -247,11 +280,11 @@ def __init__(self, _, is_text: bool) -> None: def __init__(self) -> None: """Build an instance of the patent handler.""" # Current patent being parsed - self.doc: Optional[DoclingDocument] = None + self.doc: DoclingDocument | None = None # Keep track of docling hierarchy level self.level: LevelNumber = 1 # Keep track of docling parents by level - self.parents: dict[LevelNumber, Optional[DocItem]] = {1: None} + self.parents: dict[LevelNumber, DocItem | None] = {1: None} # Content to retain for the current patent self.property: list[str] self.claim: str @@ -352,7 +385,7 @@ def characters(self, content): self.text += content def _start_registered_elements( - self, tag: str, attributes: xml.sax.xmlreader.AttributesImpl + self, tag: str, attributes: AttributesImpl ) -> None: if tag in [member.value for member in self.Element]: # special case for claims: claim lines may start before the @@ -514,12 +547,26 @@ def __init__(self) -> None: self.pattern = re.compile(r"^(<table .*?</table>)", re.MULTILINE | re.DOTALL) @override - def parse(self, patent_content: str) -> Optional[DoclingDocument]: + def parse(self, patent_content: str) -> DoclingDocument | None: try: - xml.sax.parseString(patent_content, self.handler) - except xml.sax._exceptions.SAXParseException as exc_sax: - _log.error(f"Error in parsing USPTO document: {exc_sax}") - + parser = make_parser() + parser.setFeature(feature_external_ges, False) + parser.setFeature(feature_external_pes, False) + parser.forbid_dtd = False + parser.forbid_entities = False + parser.forbid_external = False + parser.setContentHandler(self.handler) + parser.parse(StringIO(patent_content)) + except SAXParseException as exc_sax: + _log.error(f"Error in parsing USPTO document (malformed XML): {exc_sax}") + return None + except DefusedXmlException as exc_defused: + _log.error( + f"Error in parsing USPTO document (security issue detected): {exc_defused}" + ) + return None + except Exception as exc: + _log.error(f"Unexpected error in parsing USPTO document: {exc}") return None doc = self.handler.doc @@ -546,11 +593,11 @@ def parse(self, patent_content: str) -> Optional[DoclingDocument]: return doc - class PatentHandler(xml.sax.handler.ContentHandler): + class PatentHandler(ContentHandler): """SAX ContentHandler for patent documents.""" - GRANT_DOC_ELEMENT: Final = "PATDOC" - CLAIM_STATEMENT: Final = "What is claimed is:" + GRANT_DOC_ELEMENT: Final[str] = "PATDOC" + CLAIM_STATEMENT: Final[str] = "What is claimed is:" @unique class Element(Enum): @@ -585,11 +632,11 @@ def __init__(self, _, is_text: bool) -> None: def __init__(self) -> None: """Build an instance of the patent handler.""" # Current patent being parsed - self.doc: Optional[DoclingDocument] = None + self.doc: DoclingDocument | None = None # Keep track of docling hierarchy level self.level: LevelNumber = 1 # Keep track of docling parents by level - self.parents: dict[LevelNumber, Optional[DocItem]] = {1: None} + self.parents: dict[LevelNumber, DocItem | None] = {1: None} # Content to retain for the current patent self.property: list[str] self.claim: str @@ -684,7 +731,7 @@ def characters(self, content): self.text += content def _start_registered_elements( - self, tag: str, attributes: xml.sax.xmlreader.AttributesImpl + self, tag: str, attributes: AttributesImpl ) -> None: if tag in [member.value for member in self.Element]: if ( @@ -887,13 +934,13 @@ class Field(Enum): @override def __init__(self) -> None: """Build an instance of PatentUsptoGrantAps class.""" - self.doc: Optional[DoclingDocument] = None + self.doc: DoclingDocument | None = None # Keep track of docling hierarchy level self.level: LevelNumber = 1 # Keep track of docling parents by level - self.parents: dict[LevelNumber, Optional[DocItem]] = {1: None} + self.parents: dict[LevelNumber, DocItem | None] = {1: None} - def get_last_text_item(self) -> Optional[TextItem]: + def get_last_text_item(self) -> TextItem | None: """Get the last text item at the current document level. Returns: @@ -1030,7 +1077,7 @@ def store_content(self, section: str, field: str, value: str) -> None: parent=self.parents[self.level], ) - def parse(self, patent_content: str) -> Optional[DoclingDocument]: + def parse(self, patent_content: str) -> DoclingDocument | None: self.doc = self.doc = DoclingDocument(name="file") section: str = "" key: str = "" @@ -1075,12 +1122,26 @@ def __init__(self) -> None: self.pattern = re.compile(r"^(<table .*?</table>)", re.MULTILINE | re.DOTALL) @override - def parse(self, patent_content: str) -> Optional[DoclingDocument]: + def parse(self, patent_content: str) -> DoclingDocument | None: try: - xml.sax.parseString(patent_content, self.handler) - except xml.sax._exceptions.SAXParseException as exc_sax: - _log.error(f"Error in parsing USPTO document: {exc_sax}") - + parser = make_parser() + parser.setFeature(feature_external_ges, False) + parser.setFeature(feature_external_pes, False) + parser.forbid_dtd = False + parser.forbid_entities = False + parser.forbid_external = False + parser.setContentHandler(self.handler) + parser.parse(StringIO(patent_content)) + except SAXParseException as exc_sax: + _log.error(f"Error in parsing USPTO document (malformed XML): {exc_sax}") + return None + except DefusedXmlException as exc_defused: + _log.error( + f"Error in parsing USPTO document (security issue detected): {exc_defused}" + ) + return None + except Exception as exc: + _log.error(f"Unexpected error in parsing USPTO document: {exc}") return None doc = self.handler.doc @@ -1107,10 +1168,10 @@ def parse(self, patent_content: str) -> Optional[DoclingDocument]: return doc - class PatentHandler(xml.sax.handler.ContentHandler): + class PatentHandler(ContentHandler): """SAX ContentHandler for patent documents.""" - APP_DOC_ELEMENT: Final = "patent-application-publication" + APP_DOC_ELEMENT: Final[str] = "patent-application-publication" @unique class Element(Enum): @@ -1146,11 +1207,11 @@ def __init__(self, _, is_text: bool) -> None: def __init__(self) -> None: """Build an instance of the patent handler.""" # Current patent being parsed - self.doc: Optional[DoclingDocument] = None + self.doc: DoclingDocument | None = None # Keep track of docling hierarchy level self.level: LevelNumber = 1 # Keep track of docling parents by level - self.parents: dict[LevelNumber, Optional[DocItem]] = {1: None} + self.parents: dict[LevelNumber, DocItem | None] = {1: None} # Content to retain for the current patent self.property: list[str] self.claim: str @@ -1245,7 +1306,7 @@ def characters(self, content): self.text += content def _start_registered_elements( - self, tag: str, attributes: xml.sax.xmlreader.AttributesImpl + self, tag: str, attributes: AttributesImpl ) -> None: if tag in [member.value for member in self.Element]: # special case for claims: claim lines may start before the @@ -1421,6 +1482,12 @@ def __init__(self, input: str) -> None: Args: input: The xml content. + + Security Note: + This parser uses BeautifulSoup with lxml, which can be vulnerable to XXE. + However, the input here comes from table strings extracted AFTER the main + document has been safely parsed by defusedxml, so the content is already + sanitized and safe to parse. """ self.max_nbr_messages = 2 self.nbr_messages = 0 @@ -1678,7 +1745,7 @@ def _parse_table(self, table: Tag) -> TableData: return dl_table - def parse(self) -> Optional[TableData]: + def parse(self) -> TableData | None: """Parse the first table from an xml content. Returns:
pyproject.toml+2 −0 modified@@ -71,6 +71,7 @@ dependencies = [ 'scipy (>=1.6.0,<2.0.0)', "accelerate>=1.0.0,<2", "polyfactory>=2.22.2", + "defusedxml (>=0.7.1, <0.8.0)", ] [project.urls] @@ -132,6 +133,7 @@ dev = [ "ipywidgets~=8.1", "nbqa~=1.9", "python-semantic-release~=7.32", + "types-defusedxml (>=0.7.0.20250822, <0.8.0)", ] docs = [ "mkdocs-material~=9.5",
uv.lock+13 −0 modified@@ -1030,6 +1030,7 @@ dependencies = [ { name = "accelerate" }, { name = "beautifulsoup4" }, { name = "certifi" }, + { name = "defusedxml" }, { name = "docling-core", extra = ["chunking"] }, { name = "docling-ibm-models" }, { name = "docling-parse" }, @@ -1108,6 +1109,7 @@ dev = [ { name = "pytest-durations" }, { name = "pytest-xdist" }, { name = "python-semantic-release" }, + { name = "types-defusedxml" }, { name = "types-openpyxl" }, { name = "types-requests" }, { name = "types-setuptools" }, @@ -1138,6 +1140,7 @@ requires-dist = [ { name = "accelerate", marker = "extra == 'vlm'", specifier = ">=1.2.1,<2.0.0" }, { name = "beautifulsoup4", specifier = ">=4.12.3,<5.0.0" }, { name = "certifi", specifier = ">=2024.7.4" }, + { name = "defusedxml", specifier = ">=0.7.1,<0.8.0" }, { name = "docling-core", extras = ["chunking"], specifier = ">=2.62.0,<3.0.0" }, { name = "docling-ibm-models", specifier = ">=3.9.1,<4" }, { name = "docling-parse", specifier = ">=5.3.2,<6.0.0" }, @@ -1199,6 +1202,7 @@ dev = [ { name = "pytest-durations", specifier = "~=1.6.1" }, { name = "pytest-xdist", specifier = "~=3.3" }, { name = "python-semantic-release", specifier = "~=7.32" }, + { name = "types-defusedxml", specifier = ">=0.7.0.20250822,<0.8.0" }, { name = "types-openpyxl", specifier = "~=3.1" }, { name = "types-requests", specifier = "~=2.31" }, { name = "types-setuptools", specifier = "~=70.3" }, @@ -6854,6 +6858,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/ab/3d/21a2212b5fcef9e8e9f368403885dc567b7d31e50b2ce393efad3cd83572/types_awscrt-0.31.2-py3-none-any.whl", hash = "sha256:3d6a29c1cca894b191be408f4d985a8e3a14d919785652dd3fa4ee558143e4bf", size = 43340, upload-time = "2026-02-16T02:33:52.109Z" }, ] +[[package]] +name = "types-defusedxml" +version = "0.7.0.20250822" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/7d/4a/5b997ae87bf301d1796f72637baa4e0e10d7db17704a8a71878a9f77f0c0/types_defusedxml-0.7.0.20250822.tar.gz", hash = "sha256:ba6c395105f800c973bba8a25e41b215483e55ec79c8ca82b6fe90ba0bc3f8b2", size = 10590, upload-time = "2025-08-22T03:02:59.547Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/13/73/8a36998cee9d7c9702ed64a31f0866c7f192ecffc22771d44dbcc7878f18/types_defusedxml-0.7.0.20250822-py3-none-any.whl", hash = "sha256:5ee219f8a9a79c184773599ad216123aedc62a969533ec36737ec98601f20dcf", size = 13430, upload-time = "2025-08-22T03:02:58.466Z" }, +] + [[package]] name = "types-openpyxl" version = "3.1.5.20250919"
Vulnerability mechanics
Root cause
"The USPTO patent XML parser used the standard xml.sax.parseString() without protection against XML External Entity (XXE) attacks."
Attack vector
An attacker can craft malicious USPTO patent XML files containing external entity references. These references can be used to read arbitrary files from the server's filesystem, perform Server-Side Request Forgery (SSRF) attacks, or cause a denial of service through entity expansion (Billion Laughs attack) [CWE-776]. This vulnerability affects three USPTO patent format parsers: ICE (v4.x), Grant v2.5, and Application v1.x.
Affected code
The vulnerability exists in the `docling/backend/xml/uspto_backend.py` file, specifically within the `PatentUsptoIce.parse`, `PatentUsptoGrantV25.parse`, and `PatentUsptoAppV1.parse` methods. These methods previously used `xml.sax.parseString()` which is susceptible to XXE attacks. The fix involves replacing this with `defusedxml.sax.make_parser()` and configuring it securely.
What the fix does
The patch updates the XML parser to use `defusedxml.sax.make_parser()` with a secure configuration that explicitly blocks external entity resolution by setting `feature_external_ges=False` and `feature_external_pes=False`. This configuration prevents XXE attacks while still allowing the DTD declarations required by USPTO files, thus maintaining compatibility and security [patch_id=4714025]. The fix is included in version 2.74.0.
Preconditions
- inputThe attacker must be able to provide a malicious USPTO patent XML file to the parser.
Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
3News mentions
1- Docling Project: Eight High-Severity Vulnerabilities Disclosed TogetherVypr Intelligence · Jun 3, 2026