Docling Core: Unsafe remote filename resolution
Description
Impact
In versions >= 1.5.0, < 2.74.1, docling-core did not sufficiently restrict remote request destinations and could resolve a server-provided Content-Disposition to a local path in an unsafe manner.
In applications that accept untrusted URLs, this could allow SSRF attacks targeting local files outside the user-defined cache directory.
Patches
Patched in docling-core 2.74.1. The fix adds stricter validation for remote destinations and normalizes server-provided filenames before use.
Users should upgrade to: - docling-core >= 2.74.1
Workarounds
If upgrading is not immediately possible, avoid passing untrusted URLs into remote fetch functionality.
### References - Fix release: `v2.74.1`
Affected products
2- Range: >= 1.5.0, < 2.74.1
Patches
2473fbacfb938fix: refine remote filename handling (#591)
2 files changed · +271 −29
docling_core/utils/file.py+105 −29 modified@@ -1,11 +1,12 @@ """File-related utilities.""" -import importlib +import ipaddress import re import tempfile from io import BytesIO from pathlib import Path from typing import Optional, Union +from urllib.parse import urlparse import requests from pydantic import AnyHttpUrl, TypeAdapter, ValidationError @@ -14,6 +15,51 @@ from docling_core.types.doc.utils import relative_path from docling_core.types.io import DocumentStream +_MAX_REDIRECTS = 5 + + +def _is_safe_url(url: str) -> bool: + """Return whether a URL resolves to a globally routable address.""" + try: + parsed = urlparse(url) + hostname = parsed.hostname + + if not hostname: + return False + + try: + ip = ipaddress.ip_address(hostname) + except ValueError: + import socket + + try: + ip_str = socket.gethostbyname(hostname) + ip = ipaddress.ip_address(ip_str) + except (socket.gaierror, socket.herror): + return False + + return ip.is_global and not ( + ip.is_private + or ip.is_loopback + or ip.is_link_local + or ip.is_reserved + or ip.is_multicast + or ip.is_unspecified + ) + except Exception: + return False + + +def _sanitize_filename(filename: str) -> Optional[str]: + """Return a basename-safe filename, or None if no usable basename remains.""" + normalized = filename.replace("\\", "/") + basename = Path(normalized).name + + if not basename or basename in (".", "..") or "/" in basename: + return None + + return basename + def resolve_remote_filename( http_url: AnyHttpUrl, @@ -30,19 +76,23 @@ def resolve_remote_filename( Returns: str: The actual filename of the remote url. """ - fname = None - # try to get filename from response header + raw_fname = None if cont_disp := response_headers.get("Content-Disposition"): for par in cont_disp.strip().split(";"): - # currently only handling directive "filename" (not "*filename") if (split := par.split("=")) and split[0].strip() == "filename": - fname = "=".join(split[1:]).strip().strip("'\"") or None + raw_fname = "=".join(split[1:]).strip().strip("'\"") or None break - # otherwise, use name from URL: - if fname is None: - fname = Path(http_url.path or "").name or fallback_filename - return fname + if raw_fname is None: + raw_fname = Path(http_url.path or "").name or fallback_filename + + if fname := _sanitize_filename(raw_fname): + return fname + + if fname := _sanitize_filename(fallback_filename): + return fname + + raise ValueError("Could not derive a safe filename") def resolve_source_to_stream( @@ -63,45 +113,71 @@ def resolve_source_to_stream( """ try: http_url: AnyHttpUrl = TypeAdapter(AnyHttpUrl).validate_python(source) + url_str = str(http_url) + + if not _is_safe_url(url_str): + raise ValueError(f"URL is not allowed: {url_str}") - # make all header keys lower case _headers = headers or {} req_headers = {k.lower(): v for k, v in _headers.items()} - # add user-agent is not set if "user-agent" not in req_headers: - agent_name = f"docling-core/{importlib.metadata.version('docling-core')}" + try: + import importlib.metadata + + agent_name = f"docling-core/{importlib.metadata.version('docling-core')}" + except Exception: + agent_name = "docling-core" req_headers["user-agent"] = agent_name - # Google Docs, Files, PDF URLs, Spreadsheets, Presentations: convert to export URL google_doc_id = re.search( r"google\.com\/(file|document|spreadsheets|presentation)\/d\/([\w-]+)", - str(http_url), + url_str, ) if google_doc_id: doc_type = google_doc_id.group(1) doc_id = google_doc_id.group(2) if doc_type == "file": - http_url = TypeAdapter(AnyHttpUrl).validate_python( - f"https://drive.google.com/uc?export=download&id={doc_id}" - ) + url_str = f"https://drive.google.com/uc?export=download&id={doc_id}" elif doc_type == "document": - http_url = TypeAdapter(AnyHttpUrl).validate_python( - f"https://docs.google.com/document/d/{doc_id}/export?format=docx" - ) + url_str = f"https://docs.google.com/document/d/{doc_id}/export?format=docx" elif doc_type == "spreadsheets": - http_url = TypeAdapter(AnyHttpUrl).validate_python( - f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=xlsx" - ) + url_str = f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=xlsx" elif doc_type == "presentation": - http_url = TypeAdapter(AnyHttpUrl).validate_python( - f"https://docs.google.com/presentation/d/{doc_id}/export?format=pptx" - ) + url_str = f"https://docs.google.com/presentation/d/{doc_id}/export?format=pptx" + else: + raise ValueError(f"Unexpected Google doc type: {doc_type}") + + http_url = TypeAdapter(AnyHttpUrl).validate_python(url_str) + + session = requests.Session() + session.max_redirects = _MAX_REDIRECTS - # fetch the page - res = requests.get(http_url, stream=True, headers=req_headers) + def _check_redirect_safety(response, *args, **kwargs): + """Validate each redirect target before following it.""" + if response.is_redirect or response.is_permanent_redirect: + redirect_url = response.headers.get("location") + if redirect_url: + if not redirect_url.startswith(("http://", "https://")): + from urllib.parse import urljoin + + redirect_url = urljoin(response.url, redirect_url) + + if not _is_safe_url(redirect_url): + raise ValueError(f"Redirect target is not allowed: {redirect_url}") + + session.hooks["response"].append(_check_redirect_safety) + + res = session.get( + url_str, + stream=True, + headers=req_headers, + allow_redirects=True, + ) res.raise_for_status() - fname = resolve_remote_filename(http_url=http_url, response_headers=res.headers) + + response_headers = dict(res.headers) + fname = resolve_remote_filename(http_url=http_url, response_headers=response_headers) stream = BytesIO(res.content) doc_stream = DocumentStream(name=fname, stream=stream)
test/test_utils.py+166 −0 modified@@ -150,3 +150,169 @@ def get_dummy_response(*args, **kwargs): text = doc_stream.stream.read().decode("utf8") assert text == expected_str + + +def test_sanitize_filename_paths(): + """Test filename sanitization for path-like inputs.""" + from docling_core.utils.file import _sanitize_filename + + assert _sanitize_filename("../../etc/config.txt") == "config.txt" + + assert _sanitize_filename("/etc/config.txt") == "config.txt" + + assert _sanitize_filename("..\\..\\windows\\system32\\config") == "config" + assert _sanitize_filename("C:\\Windows\\System32\\config") == "config" + + assert _sanitize_filename("../../../etc\\config.txt") == "config.txt" + + assert _sanitize_filename("document.pdf") == "document.pdf" + assert _sanitize_filename("my-file_123.txt") == "my-file_123.txt" + + assert _sanitize_filename("") is None + assert _sanitize_filename(".") is None + assert _sanitize_filename("..") is None + + +def test_is_safe_url_rejects_private_networks(): + """Test URL filtering for non-public network ranges.""" + from docling_core.utils.file import _is_safe_url + + assert not _is_safe_url("http://10.0.0.1/file") + assert not _is_safe_url("http://172.16.0.1/file") + assert not _is_safe_url("http://192.168.1.1/file") + + assert not _is_safe_url("http://127.0.0.1/file") + assert not _is_safe_url("http://localhost/file") + + assert not _is_safe_url("http://169.254.169.254/latest/meta-data/") + + assert not _is_safe_url("http://[::1]/file") + assert not _is_safe_url("http://[fe80::1]/file") + + assert _is_safe_url("http://8.8.8.8/file") + assert _is_safe_url("https://example.com/file") + assert _is_safe_url("https://github.com/github/file") + + +def test_resolve_remote_filename_sanitizes_content_disposition(monkeypatch): + """Test filename normalization from Content-Disposition.""" + from docling_core.utils.file import resolve_source_to_stream + from requests import Response + + def get_response(*args, **kwargs): + r = Response() + r.status_code = 200 + r._content = b"test content" + r.headers["Content-Disposition"] = 'attachment; filename="../../etc/config.txt"' + return r + + monkeypatch.setattr("requests.Session.get", get_response) + + doc_stream = resolve_source_to_stream("https://example.com/file") + assert doc_stream.name == "config.txt" + + +def test_resolve_source_rejects_non_public_urls(monkeypatch): + """Test that non-public URLs are rejected.""" + from docling_core.utils.file import resolve_source_to_stream + import pytest + + with pytest.raises(ValueError, match="URL is not allowed"): + resolve_source_to_stream("http://127.0.0.1/file") + + with pytest.raises(ValueError, match="URL is not allowed"): + resolve_source_to_stream("http://10.0.0.1/file") + + with pytest.raises(ValueError, match="URL is not allowed"): + resolve_source_to_stream("http://192.168.1.1/file") + + with pytest.raises(ValueError, match="URL is not allowed"): + resolve_source_to_stream("http://169.254.169.254/latest/meta-data/") + + +def test_resolve_source_to_path_sanitizes_filename(monkeypatch, tmp_path): + """Test that saved filenames stay within the target directory.""" + from docling_core.utils.file import resolve_source_to_path + from requests import Response + + def get_response(*args, **kwargs): + r = Response() + r.status_code = 200 + r._content = b"test content" + r.headers["Content-Disposition"] = 'attachment; filename="../../../../tmp/output.txt"' + return r + + monkeypatch.setattr("requests.Session.get", get_response) + + cache_dir = tmp_path / "cache" + cache_dir.mkdir() + + result_path = resolve_source_to_path("https://example.com/file", workdir=cache_dir) + + assert result_path.parent == cache_dir + assert result_path.name == "output.txt" + assert result_path.exists() + + assert not (tmp_path.parent.parent.parent / "tmp" / "output.txt").exists() + + +def test_redirect_limit_enforced(monkeypatch): + """Test that redirect limits are configured on the session.""" + from docling_core.utils.file import _MAX_REDIRECTS + from requests import Session, Response + + session_created = [] + + original_init = Session.__init__ + + def track_session_init(self, *args, **kwargs): + original_init(self, *args, **kwargs) + session_created.append(self) + + monkeypatch.setattr(Session, "__init__", track_session_init) + + def mock_get(*args, **kwargs): + r = Response() + r.status_code = 200 + r._content = b"test" + return r + + monkeypatch.setattr(Session, "get", mock_get) + + from docling_core.utils.file import resolve_source_to_stream + + try: + resolve_source_to_stream("https://example.com/file") + except Exception: + pass + + assert len(session_created) > 0 + session = session_created[0] + assert session.max_redirects == _MAX_REDIRECTS + + + +def test_redirect_to_non_public_ip_rejected(monkeypatch): + """Test that redirects to non-public addresses are rejected.""" + from docling_core.utils.file import resolve_source_to_stream + from requests import Response, Session + import pytest + + original_get = Session.get + + def mock_get_with_redirect(self, *args, **kwargs): + r = Response() + r.status_code = 302 + r.headers['location'] = 'http://192.168.1.1/private-file' + r.url = args[0] if args else kwargs.get('url', 'http://example.com') + + if hasattr(self, 'hooks') and 'response' in self.hooks: + for hook in self.hooks['response']: + hook(r) + + return r + + monkeypatch.setattr(Session, "get", mock_get_with_redirect) + + with pytest.raises(ValueError, match="Redirect target is not allowed"): + resolve_source_to_stream("https://example.com/redirect")
2087d0f36261fix: refine ImageRef URI handling (#595)
5 files changed · +202 −5
docling_core/types/doc/document.py+10 −3 modified@@ -66,6 +66,7 @@ ) from docling_core.types.doc.tokens import DocumentToken, TableToken from docling_core.types.doc.utils import parse_otsl_table_content, relative_path +from docling_core.utils.settings import settings _logger = logging.getLogger(__name__) @@ -1086,12 +1087,18 @@ def pil_image(self) -> Optional[PILImage.Image]: return self._pil if isinstance(self.uri, AnyUrl): - if self.uri.scheme == "data": + if self.uri.scheme == "file": + if not settings.allow_image_file_uri: + raise ValueError("file:// URI scheme is not enabled.") + self._pil = PILImage.open(unquote(str(self.uri.path))) + elif self.uri.scheme == "data": encoded_img = str(self.uri).split(",")[1] decoded_img = base64.b64decode(encoded_img) + + if len(decoded_img) > settings.max_image_decoded_size: + raise ValueError(f"Decoded image exceeds size limit of {settings.max_image_decoded_size} bytes.") + self._pil = PILImage.open(BytesIO(decoded_img)) - elif self.uri.scheme == "file": - self._pil = PILImage.open(unquote(str(self.uri.path))) # else: Handle http request or other protocols... elif isinstance(self.uri, Path): self._pil = PILImage.open(self.uri)
docling_core/utils/settings.py+11 −0 added@@ -0,0 +1,11 @@ +from pydantic_settings import BaseSettings, SettingsConfigDict + + +class CoreSettings(BaseSettings): + model_config = SettingsConfigDict(env_prefix="DOCLINGCORE_") + + allow_image_file_uri: bool = False + max_image_decoded_size: int = 20 * 1024 * 1024 # 20MB + + +settings = CoreSettings()
pyproject.toml+1 −0 modified@@ -50,6 +50,7 @@ dependencies = [ 'typer (>=0.12.5,<0.25.0)', 'latex2mathml (>=3.77.0,<4.0.0)', "defusedxml (>=0.7.1, <0.8.0)", + "pydantic-settings>=2.14.0", ] [project.urls]
test/test_docling_doc.py+155 −2 modified@@ -6,7 +6,8 @@ from pathlib import Path from typing import Optional, Union from unittest.mock import Mock - +from io import BytesIO +import base64 import pytest import yaml from PIL import Image as PILImage @@ -51,6 +52,7 @@ from docling_core.types.doc.document import FieldHeadingItem, FieldItem, FieldRegionItem, FieldValueItem from docling_core.types.doc.document import CURRENT_VERSION, PageItem from docling_core.types.doc.webvtt import WebVTTFile +from docling_core.utils.settings import settings from .test_data_gen_flag import GEN_TEST_DATA @@ -795,7 +797,158 @@ def test_image_ref(): } image = ImageRef.model_validate(data_path) assert isinstance(image.uri, Path) - assert image.uri.name == "image.png" + + +def test_image_ref_blocks_file_scheme(): + """Test that file:// URI scheme is blocked.""" + fig_image = PILImage.new(mode="RGB", size=(2, 2), color=(0, 0, 0)) + image_ref = ImageRef.from_pil(image=fig_image, dpi=72) + + image_ref.uri = AnyUrl("file:///tmp/test.png") + + with pytest.raises(ValueError, match="file:// URI scheme is not enabled"): + _ = image_ref.pil_image + + +def test_image_ref_blocks_oversized_base64(): + """Test that oversized base64 data URIs are blocked.""" + import base64 + + large_bytes = b"X" * (28 * 1024 * 1024) + large_data = base64.b64encode(large_bytes).decode('ascii') + data_uri = f"data:image/png;base64,{large_data}" + + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=100, height=100), + uri=AnyUrl(data_uri) + ) + + with pytest.raises(ValueError, match="exceeds size limit"): + _ = image_ref.pil_image + + + +def test_image_ref_accepts_valid_base64(): + """Test that valid base64 data URIs within size limit work correctly.""" + import base64 + from io import BytesIO + + fig_image = PILImage.new(mode="RGB", size=(1, 1), color=(255, 0, 0)) + + # Convert to base64 data URI + buffer = BytesIO() + fig_image.save(buffer, format="PNG") + img_bytes = buffer.getvalue() + img_base64 = base64.b64encode(img_bytes).decode('ascii') + data_uri = f"data:image/png;base64,{img_base64}" + + # Create ImageRef with data URI + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=1, height=1), + uri=AnyUrl(data_uri) + ) + + # Should successfully decode the image + decoded_image = image_ref.pil_image + assert isinstance(decoded_image, PILImage.Image) + assert decoded_image.size == (1, 1) + assert decoded_image.mode == "RGB" + + +def test_file_uri_allowed_with_env_var(): + """Test that file:// URIs work when enabled via settings.""" + test_img_path = Path("/tmp/test_docling_env.png") + img = PILImage.new("RGB", (100, 100), color="red") + img.save(test_img_path) + + orig_allow_image_file_uri = settings.allow_image_file_uri + try: + settings.allow_image_file_uri = True + + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=100, height=100), + uri=AnyUrl(f"file://{test_img_path}"), + ) + + pil_img = image_ref.pil_image + assert pil_img is not None + assert pil_img.size == (100, 100) + assert pil_img.mode == "RGB" + finally: + test_img_path.unlink(missing_ok=True) + settings.allow_image_file_uri = orig_allow_image_file_uri + + +def test_file_uri_blocked_by_default(): + """Test that file:// URIs are blocked by default.""" + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=100, height=100), + uri=AnyUrl("file:///tmp/test.png"), + ) + + with pytest.raises(ValueError, match="file:// URI scheme is not enabled"): + _ = image_ref.pil_image + + +def test_max_decoded_size_custom(): + """Test that oversized images are rejected based on custom limit.""" + orig_max_image_decoded_size = settings.max_image_decoded_size + try: + settings.max_image_decoded_size = 100 # 100 bytes limit + + # Create image that will exceed 100 bytes when base64 decoded + # A 50x50 RGB image is 50*50*3 = 7500 bytes uncompressed + img = PILImage.new("RGB", (50, 50), color="green") + buffer = BytesIO() + img.save(buffer, format="PNG") + img_bytes = buffer.getvalue() + + # Verify the decoded size will exceed our limit + assert len(img_bytes) > 100, f"Test image is only {len(img_bytes)} bytes, need > 100" + + encoded = base64.b64encode(img_bytes).decode("utf-8") + data_uri = f"data:image/png;base64,{encoded}" + + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=50, height=50), + uri=AnyUrl(data_uri), + ) + + with pytest.raises(ValueError, match="Decoded image exceeds size limit"): + _ = image_ref.pil_image + finally: + settings.max_image_decoded_size = orig_max_image_decoded_size + +def test_max_decoded_size_default(): + """Test that small images work with default 20MB limit.""" + img = PILImage.new("RGB", (100, 100), color="blue") + buffer = BytesIO() + img.save(buffer, format="PNG") + img_bytes = buffer.getvalue() + + encoded = base64.b64encode(img_bytes).decode("utf-8") + data_uri = f"data:image/png;base64,{encoded}" + + image_ref = ImageRef( + dpi=72, + mimetype="image/png", + size=Size(width=100, height=100), + uri=AnyUrl(data_uri), + ) + + pil_img = image_ref.pil_image + assert pil_img is not None + assert pil_img.size == (100, 100) def test_upgrade_content_layer_from_1_0_0() -> None:
uv.lock+25 −0 modified@@ -965,6 +965,7 @@ dependencies = [ { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" }, { name = "pillow" }, { name = "pydantic" }, + { name = "pydantic-settings" }, { name = "pyyaml" }, { name = "tabulate" }, { name = "typer" }, @@ -1036,6 +1037,7 @@ requires-dist = [ { name = "pandas", specifier = ">=2.1.4,<4.0.0" }, { name = "pillow", specifier = ">=10.0.0,<13.0.0" }, { name = "pydantic", specifier = ">=2.6.0,!=2.10.0,!=2.10.1,!=2.10.2,<3.0.0" }, + { name = "pydantic-settings", specifier = ">=2.14.0" }, { name = "pyyaml", specifier = ">=5.1,<7.0.0" }, { name = "semchunk", marker = "extra == 'chunking'", specifier = ">=2.2.0,<4.0.0" }, { name = "semchunk", marker = "extra == 'chunking-openai'", specifier = ">=2.2.0,<4.0.0" }, @@ -3321,6 +3323,20 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302, upload-time = "2025-11-04T13:43:46.64Z" }, ] +[[package]] +name = "pydantic-settings" +version = "2.14.0" +source = { registry = "https://pypi.org/simple" } +dependencies = [ + { name = "pydantic" }, + { name = "python-dotenv" }, + { name = "typing-inspection" }, +] +sdist = { url = "https://files.pythonhosted.org/packages/42/98/c8345dccdc31de4228c039a98f6467a941e39558da41c1744fbe29fa5666/pydantic_settings-2.14.0.tar.gz", hash = "sha256:24285fd4b0e0c06507dd9fdfd331ee23794305352aaec8fc4eb92d4047aeb67d", size = 235709, upload-time = "2026-04-20T13:37:40.293Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/01/dd/bebff3040138f00ae8a102d426b27349b9a49acc310fcae7f92112d867e3/pydantic_settings-2.14.0-py3-none-any.whl", hash = "sha256:fc8d5d692eb7092e43c8647c1c35a3ecd00e040fcf02ed86f4cb5458ca62182e", size = 60940, upload-time = "2026-04-20T13:37:38.586Z" }, +] + [[package]] name = "pydocstyle" version = "6.3.0" @@ -3417,6 +3433,15 @@ wheels = [ { url = "https://files.pythonhosted.org/packages/e7/80/73211fc5bfbfc562369b4aa61dc1e4bf07dc7b34df7b317e4539316b809c/python_discovery-1.1.3-py3-none-any.whl", hash = "sha256:90e795f0121bc84572e737c9aa9966311b9fde44ffb88a5953b3ec9b31c6945e", size = 31485, upload-time = "2026-03-10T15:08:13.06Z" }, ] +[[package]] +name = "python-dotenv" +version = "1.2.2" +source = { registry = "https://pypi.org/simple" } +sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135, upload-time = "2026-03-01T16:00:26.196Z" } +wheels = [ + { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101, upload-time = "2026-03-01T16:00:25.09Z" }, +] + [[package]] name = "python-gitlab" version = "3.15.0"
Vulnerability mechanics
Root cause
"The docling-core library did not sufficiently validate remote request destinations, allowing a server-provided `Content-Disposition` header to specify a local path, leading to SSRF and path traversal."
Attack vector
An attacker can host a malicious document or resource that, when fetched by an application using `docling-core` versions `>= 1.5.0, < 2.74.1`, includes a `Content-Disposition` header pointing to a local file. This header is then processed by the `resolve_remote_filename` function, which fails to adequately sanitize the filename, allowing it to resolve to a path outside the intended directory. This enables Server-Side Request Forgery (SSRF) attacks targeting local files [CWE-918].
Affected code
The vulnerability resides in the `resolve_remote_filename` function within `docling_core/utils/file.py`, which processes the `Content-Disposition` header. The `resolve_source_to_stream` function in the same file is also affected as it uses `requests.get` without sufficient validation of the URL and its redirects. The `ImageRef` class in `docling_core/types/doc/document.py` was also updated to address related security concerns with URI handling [patch_id=4714020, patch_id=4714021].
What the fix does
The patch introduces stricter validation for remote destinations and normalizes server-provided filenames before use in `docling_core/utils/file.py`. Specifically, the `_sanitize_filename` function now correctly handles path traversal attempts by extracting only the base filename, and `_is_safe_url` prevents requests to non-public IP addresses. Additionally, redirect targets are now checked for safety, mitigating SSRF risks [patch_id=4714020]. The `ImageRef` class also received updates to prevent `file://` URIs by default and to enforce a maximum decoded image size, further hardening against malicious inputs [patch_id=4714021].
Preconditions
- inputThe application must accept untrusted URLs for remote fetching.
Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
3News mentions
1- Docling Project: Eight High-Severity Vulnerabilities Disclosed TogetherVypr Intelligence · Jun 3, 2026