VYPR
High severity8.6NVD Advisory· Published Jun 3, 2026· Updated Jun 3, 2026

Docling Core: Unsafe remote filename resolution

CVE-2026-44023

Description

Impact

In versions >= 1.5.0, < 2.74.1, docling-core did not sufficiently restrict remote request destinations and could resolve a server-provided Content-Disposition to a local path in an unsafe manner.

In applications that accept untrusted URLs, this could allow SSRF attacks targeting local files outside the user-defined cache directory.

Patches

Patched in docling-core 2.74.1. The fix adds stricter validation for remote destinations and normalizes server-provided filenames before use.

Users should upgrade to: - docling-core >= 2.74.1

Workarounds

If upgrading is not immediately possible, avoid passing untrusted URLs into remote fetch functionality.

### References - Fix release: `v2.74.1`

Affected products

2

Patches

2
473fbacfb938

fix: refine remote filename handling (#591)

https://github.com/docling-project/docling-corePanos VagenasApr 21, 2026Fixed in 2.74.1via llm-release-walk
2 files changed · +271 29
  • docling_core/utils/file.py+105 29 modified
    @@ -1,11 +1,12 @@
     """File-related utilities."""
     
    -import importlib
    +import ipaddress
     import re
     import tempfile
     from io import BytesIO
     from pathlib import Path
     from typing import Optional, Union
    +from urllib.parse import urlparse
     
     import requests
     from pydantic import AnyHttpUrl, TypeAdapter, ValidationError
    @@ -14,6 +15,51 @@
     from docling_core.types.doc.utils import relative_path
     from docling_core.types.io import DocumentStream
     
    +_MAX_REDIRECTS = 5
    +
    +
    +def _is_safe_url(url: str) -> bool:
    +    """Return whether a URL resolves to a globally routable address."""
    +    try:
    +        parsed = urlparse(url)
    +        hostname = parsed.hostname
    +
    +        if not hostname:
    +            return False
    +
    +        try:
    +            ip = ipaddress.ip_address(hostname)
    +        except ValueError:
    +            import socket
    +
    +            try:
    +                ip_str = socket.gethostbyname(hostname)
    +                ip = ipaddress.ip_address(ip_str)
    +            except (socket.gaierror, socket.herror):
    +                return False
    +
    +        return ip.is_global and not (
    +            ip.is_private
    +            or ip.is_loopback
    +            or ip.is_link_local
    +            or ip.is_reserved
    +            or ip.is_multicast
    +            or ip.is_unspecified
    +        )
    +    except Exception:
    +        return False
    +
    +
    +def _sanitize_filename(filename: str) -> Optional[str]:
    +    """Return a basename-safe filename, or None if no usable basename remains."""
    +    normalized = filename.replace("\\", "/")
    +    basename = Path(normalized).name
    +
    +    if not basename or basename in (".", "..") or "/" in basename:
    +        return None
    +
    +    return basename
    +
     
     def resolve_remote_filename(
         http_url: AnyHttpUrl,
    @@ -30,19 +76,23 @@ def resolve_remote_filename(
         Returns:
             str: The actual filename of the remote url.
         """
    -    fname = None
    -    # try to get filename from response header
    +    raw_fname = None
         if cont_disp := response_headers.get("Content-Disposition"):
             for par in cont_disp.strip().split(";"):
    -            # currently only handling directive "filename" (not "*filename")
                 if (split := par.split("=")) and split[0].strip() == "filename":
    -                fname = "=".join(split[1:]).strip().strip("'\"") or None
    +                raw_fname = "=".join(split[1:]).strip().strip("'\"") or None
                     break
    -    # otherwise, use name from URL:
    -    if fname is None:
    -        fname = Path(http_url.path or "").name or fallback_filename
     
    -    return fname
    +    if raw_fname is None:
    +        raw_fname = Path(http_url.path or "").name or fallback_filename
    +
    +    if fname := _sanitize_filename(raw_fname):
    +        return fname
    +
    +    if fname := _sanitize_filename(fallback_filename):
    +        return fname
    +
    +    raise ValueError("Could not derive a safe filename")
     
     
     def resolve_source_to_stream(
    @@ -63,45 +113,71 @@ def resolve_source_to_stream(
         """
         try:
             http_url: AnyHttpUrl = TypeAdapter(AnyHttpUrl).validate_python(source)
    +        url_str = str(http_url)
    +
    +        if not _is_safe_url(url_str):
    +            raise ValueError(f"URL is not allowed: {url_str}")
     
    -        # make all header keys lower case
             _headers = headers or {}
             req_headers = {k.lower(): v for k, v in _headers.items()}
    -        # add user-agent is not set
             if "user-agent" not in req_headers:
    -            agent_name = f"docling-core/{importlib.metadata.version('docling-core')}"
    +            try:
    +                import importlib.metadata
    +
    +                agent_name = f"docling-core/{importlib.metadata.version('docling-core')}"
    +            except Exception:
    +                agent_name = "docling-core"
                 req_headers["user-agent"] = agent_name
     
    -        # Google Docs, Files, PDF URLs, Spreadsheets, Presentations: convert to export URL
             google_doc_id = re.search(
                 r"google\.com\/(file|document|spreadsheets|presentation)\/d\/([\w-]+)",
    -            str(http_url),
    +            url_str,
             )
             if google_doc_id:
                 doc_type = google_doc_id.group(1)
                 doc_id = google_doc_id.group(2)
     
                 if doc_type == "file":
    -                http_url = TypeAdapter(AnyHttpUrl).validate_python(
    -                    f"https://drive.google.com/uc?export=download&id={doc_id}"
    -                )
    +                url_str = f"https://drive.google.com/uc?export=download&id={doc_id}"
                 elif doc_type == "document":
    -                http_url = TypeAdapter(AnyHttpUrl).validate_python(
    -                    f"https://docs.google.com/document/d/{doc_id}/export?format=docx"
    -                )
    +                url_str = f"https://docs.google.com/document/d/{doc_id}/export?format=docx"
                 elif doc_type == "spreadsheets":
    -                http_url = TypeAdapter(AnyHttpUrl).validate_python(
    -                    f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=xlsx"
    -                )
    +                url_str = f"https://docs.google.com/spreadsheets/d/{doc_id}/export?format=xlsx"
                 elif doc_type == "presentation":
    -                http_url = TypeAdapter(AnyHttpUrl).validate_python(
    -                    f"https://docs.google.com/presentation/d/{doc_id}/export?format=pptx"
    -                )
    +                url_str = f"https://docs.google.com/presentation/d/{doc_id}/export?format=pptx"
    +            else:
    +                raise ValueError(f"Unexpected Google doc type: {doc_type}")
    +
    +            http_url = TypeAdapter(AnyHttpUrl).validate_python(url_str)
    +
    +        session = requests.Session()
    +        session.max_redirects = _MAX_REDIRECTS
     
    -        # fetch the page
    -        res = requests.get(http_url, stream=True, headers=req_headers)
    +        def _check_redirect_safety(response, *args, **kwargs):
    +            """Validate each redirect target before following it."""
    +            if response.is_redirect or response.is_permanent_redirect:
    +                redirect_url = response.headers.get("location")
    +                if redirect_url:
    +                    if not redirect_url.startswith(("http://", "https://")):
    +                        from urllib.parse import urljoin
    +
    +                        redirect_url = urljoin(response.url, redirect_url)
    +
    +                    if not _is_safe_url(redirect_url):
    +                        raise ValueError(f"Redirect target is not allowed: {redirect_url}")
    +
    +        session.hooks["response"].append(_check_redirect_safety)
    +
    +        res = session.get(
    +            url_str,
    +            stream=True,
    +            headers=req_headers,
    +            allow_redirects=True,
    +        )
             res.raise_for_status()
    -        fname = resolve_remote_filename(http_url=http_url, response_headers=res.headers)
    +
    +        response_headers = dict(res.headers)
    +        fname = resolve_remote_filename(http_url=http_url, response_headers=response_headers)
     
             stream = BytesIO(res.content)
             doc_stream = DocumentStream(name=fname, stream=stream)
    
  • test/test_utils.py+166 0 modified
    @@ -150,3 +150,169 @@ def get_dummy_response(*args, **kwargs):
     
         text = doc_stream.stream.read().decode("utf8")
         assert text == expected_str
    +
    +
    +def test_sanitize_filename_paths():
    +    """Test filename sanitization for path-like inputs."""
    +    from docling_core.utils.file import _sanitize_filename
    +
    +    assert _sanitize_filename("../../etc/config.txt") == "config.txt"
    +
    +    assert _sanitize_filename("/etc/config.txt") == "config.txt"
    +
    +    assert _sanitize_filename("..\\..\\windows\\system32\\config") == "config"
    +    assert _sanitize_filename("C:\\Windows\\System32\\config") == "config"
    +
    +    assert _sanitize_filename("../../../etc\\config.txt") == "config.txt"
    +
    +    assert _sanitize_filename("document.pdf") == "document.pdf"
    +    assert _sanitize_filename("my-file_123.txt") == "my-file_123.txt"
    +
    +    assert _sanitize_filename("") is None
    +    assert _sanitize_filename(".") is None
    +    assert _sanitize_filename("..") is None
    +
    +
    +def test_is_safe_url_rejects_private_networks():
    +    """Test URL filtering for non-public network ranges."""
    +    from docling_core.utils.file import _is_safe_url
    +
    +    assert not _is_safe_url("http://10.0.0.1/file")
    +    assert not _is_safe_url("http://172.16.0.1/file")
    +    assert not _is_safe_url("http://192.168.1.1/file")
    +
    +    assert not _is_safe_url("http://127.0.0.1/file")
    +    assert not _is_safe_url("http://localhost/file")
    +
    +    assert not _is_safe_url("http://169.254.169.254/latest/meta-data/")
    +
    +    assert not _is_safe_url("http://[::1]/file")
    +    assert not _is_safe_url("http://[fe80::1]/file")
    +
    +    assert _is_safe_url("http://8.8.8.8/file")
    +    assert _is_safe_url("https://example.com/file")
    +    assert _is_safe_url("https://github.com/github/file")
    +
    +
    +def test_resolve_remote_filename_sanitizes_content_disposition(monkeypatch):
    +    """Test filename normalization from Content-Disposition."""
    +    from docling_core.utils.file import resolve_source_to_stream
    +    from requests import Response
    +
    +    def get_response(*args, **kwargs):
    +        r = Response()
    +        r.status_code = 200
    +        r._content = b"test content"
    +        r.headers["Content-Disposition"] = 'attachment; filename="../../etc/config.txt"'
    +        return r
    +
    +    monkeypatch.setattr("requests.Session.get", get_response)
    +
    +    doc_stream = resolve_source_to_stream("https://example.com/file")
    +    assert doc_stream.name == "config.txt"
    +
    +
    +def test_resolve_source_rejects_non_public_urls(monkeypatch):
    +    """Test that non-public URLs are rejected."""
    +    from docling_core.utils.file import resolve_source_to_stream
    +    import pytest
    +
    +    with pytest.raises(ValueError, match="URL is not allowed"):
    +        resolve_source_to_stream("http://127.0.0.1/file")
    +
    +    with pytest.raises(ValueError, match="URL is not allowed"):
    +        resolve_source_to_stream("http://10.0.0.1/file")
    +
    +    with pytest.raises(ValueError, match="URL is not allowed"):
    +        resolve_source_to_stream("http://192.168.1.1/file")
    +
    +    with pytest.raises(ValueError, match="URL is not allowed"):
    +        resolve_source_to_stream("http://169.254.169.254/latest/meta-data/")
    +
    +
    +def test_resolve_source_to_path_sanitizes_filename(monkeypatch, tmp_path):
    +    """Test that saved filenames stay within the target directory."""
    +    from docling_core.utils.file import resolve_source_to_path
    +    from requests import Response
    +
    +    def get_response(*args, **kwargs):
    +        r = Response()
    +        r.status_code = 200
    +        r._content = b"test content"
    +        r.headers["Content-Disposition"] = 'attachment; filename="../../../../tmp/output.txt"'
    +        return r
    +
    +    monkeypatch.setattr("requests.Session.get", get_response)
    +
    +    cache_dir = tmp_path / "cache"
    +    cache_dir.mkdir()
    +
    +    result_path = resolve_source_to_path("https://example.com/file", workdir=cache_dir)
    +
    +    assert result_path.parent == cache_dir
    +    assert result_path.name == "output.txt"
    +    assert result_path.exists()
    +
    +    assert not (tmp_path.parent.parent.parent / "tmp" / "output.txt").exists()
    +
    +
    +def test_redirect_limit_enforced(monkeypatch):
    +    """Test that redirect limits are configured on the session."""
    +    from docling_core.utils.file import _MAX_REDIRECTS
    +    from requests import Session, Response
    +
    +    session_created = []
    +
    +    original_init = Session.__init__
    +
    +    def track_session_init(self, *args, **kwargs):
    +        original_init(self, *args, **kwargs)
    +        session_created.append(self)
    +
    +    monkeypatch.setattr(Session, "__init__", track_session_init)
    +
    +    def mock_get(*args, **kwargs):
    +        r = Response()
    +        r.status_code = 200
    +        r._content = b"test"
    +        return r
    +
    +    monkeypatch.setattr(Session, "get", mock_get)
    +
    +    from docling_core.utils.file import resolve_source_to_stream
    +
    +    try:
    +        resolve_source_to_stream("https://example.com/file")
    +    except Exception:
    +        pass
    +
    +    assert len(session_created) > 0
    +    session = session_created[0]
    +    assert session.max_redirects == _MAX_REDIRECTS
    +
    +
    +
    +def test_redirect_to_non_public_ip_rejected(monkeypatch):
    +    """Test that redirects to non-public addresses are rejected."""
    +    from docling_core.utils.file import resolve_source_to_stream
    +    from requests import Response, Session
    +    import pytest
    +
    +    original_get = Session.get
    +
    +    def mock_get_with_redirect(self, *args, **kwargs):
    +        r = Response()
    +        r.status_code = 302
    +        r.headers['location'] = 'http://192.168.1.1/private-file'
    +        r.url = args[0] if args else kwargs.get('url', 'http://example.com')
    +
    +        if hasattr(self, 'hooks') and 'response' in self.hooks:
    +            for hook in self.hooks['response']:
    +                hook(r)
    +
    +        return r
    +
    +    monkeypatch.setattr(Session, "get", mock_get_with_redirect)
    +
    +    with pytest.raises(ValueError, match="Redirect target is not allowed"):
    +        resolve_source_to_stream("https://example.com/redirect")
    
2087d0f36261

fix: refine ImageRef URI handling (#595)

https://github.com/docling-project/docling-corePanos VagenasApr 22, 2026Fixed in 2.74.1via llm-release-walk
5 files changed · +202 5
  • docling_core/types/doc/document.py+10 3 modified
    @@ -66,6 +66,7 @@
     )
     from docling_core.types.doc.tokens import DocumentToken, TableToken
     from docling_core.types.doc.utils import parse_otsl_table_content, relative_path
    +from docling_core.utils.settings import settings
     
     _logger = logging.getLogger(__name__)
     
    @@ -1086,12 +1087,18 @@ def pil_image(self) -> Optional[PILImage.Image]:
                 return self._pil
     
             if isinstance(self.uri, AnyUrl):
    -            if self.uri.scheme == "data":
    +            if self.uri.scheme == "file":
    +                if not settings.allow_image_file_uri:
    +                    raise ValueError("file:// URI scheme is not enabled.")
    +                self._pil = PILImage.open(unquote(str(self.uri.path)))
    +            elif self.uri.scheme == "data":
                     encoded_img = str(self.uri).split(",")[1]
                     decoded_img = base64.b64decode(encoded_img)
    +
    +                if len(decoded_img) > settings.max_image_decoded_size:
    +                    raise ValueError(f"Decoded image exceeds size limit of {settings.max_image_decoded_size} bytes.")
    +
                     self._pil = PILImage.open(BytesIO(decoded_img))
    -            elif self.uri.scheme == "file":
    -                self._pil = PILImage.open(unquote(str(self.uri.path)))
                 # else: Handle http request or other protocols...
             elif isinstance(self.uri, Path):
                 self._pil = PILImage.open(self.uri)
    
  • docling_core/utils/settings.py+11 0 added
    @@ -0,0 +1,11 @@
    +from pydantic_settings import BaseSettings, SettingsConfigDict
    +
    +
    +class CoreSettings(BaseSettings):
    +    model_config = SettingsConfigDict(env_prefix="DOCLINGCORE_")
    +
    +    allow_image_file_uri: bool = False
    +    max_image_decoded_size: int = 20 * 1024 * 1024  # 20MB
    +
    +
    +settings = CoreSettings()
    
  • pyproject.toml+1 0 modified
    @@ -50,6 +50,7 @@ dependencies = [
         'typer (>=0.12.5,<0.25.0)',
         'latex2mathml (>=3.77.0,<4.0.0)',
         "defusedxml (>=0.7.1, <0.8.0)",
    +    "pydantic-settings>=2.14.0",
     ]
     
     [project.urls]
    
  • test/test_docling_doc.py+155 2 modified
    @@ -6,7 +6,8 @@
     from pathlib import Path
     from typing import Optional, Union
     from unittest.mock import Mock
    -
    +from io import BytesIO
    +import base64
     import pytest
     import yaml
     from PIL import Image as PILImage
    @@ -51,6 +52,7 @@
     from docling_core.types.doc.document import FieldHeadingItem, FieldItem, FieldRegionItem, FieldValueItem
     from docling_core.types.doc.document import CURRENT_VERSION, PageItem
     from docling_core.types.doc.webvtt import WebVTTFile
    +from docling_core.utils.settings import settings
     
     from .test_data_gen_flag import GEN_TEST_DATA
     
    @@ -795,7 +797,158 @@ def test_image_ref():
         }
         image = ImageRef.model_validate(data_path)
         assert isinstance(image.uri, Path)
    -    assert image.uri.name == "image.png"
    +
    +
    +def test_image_ref_blocks_file_scheme():
    +    """Test that file:// URI scheme is blocked."""
    +    fig_image = PILImage.new(mode="RGB", size=(2, 2), color=(0, 0, 0))
    +    image_ref = ImageRef.from_pil(image=fig_image, dpi=72)
    +
    +    image_ref.uri = AnyUrl("file:///tmp/test.png")
    +
    +    with pytest.raises(ValueError, match="file:// URI scheme is not enabled"):
    +        _ = image_ref.pil_image
    +
    +
    +def test_image_ref_blocks_oversized_base64():
    +    """Test that oversized base64 data URIs are blocked."""
    +    import base64
    +
    +    large_bytes = b"X" * (28 * 1024 * 1024)
    +    large_data = base64.b64encode(large_bytes).decode('ascii')
    +    data_uri = f"data:image/png;base64,{large_data}"
    +
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl(data_uri)
    +    )
    +
    +    with pytest.raises(ValueError, match="exceeds size limit"):
    +        _ = image_ref.pil_image
    +
    +
    +
    +def test_image_ref_accepts_valid_base64():
    +    """Test that valid base64 data URIs within size limit work correctly."""
    +    import base64
    +    from io import BytesIO
    +
    +    fig_image = PILImage.new(mode="RGB", size=(1, 1), color=(255, 0, 0))
    +
    +    # Convert to base64 data URI
    +    buffer = BytesIO()
    +    fig_image.save(buffer, format="PNG")
    +    img_bytes = buffer.getvalue()
    +    img_base64 = base64.b64encode(img_bytes).decode('ascii')
    +    data_uri = f"data:image/png;base64,{img_base64}"
    +
    +    # Create ImageRef with data URI
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=1, height=1),
    +        uri=AnyUrl(data_uri)
    +    )
    +
    +    # Should successfully decode the image
    +    decoded_image = image_ref.pil_image
    +    assert isinstance(decoded_image, PILImage.Image)
    +    assert decoded_image.size == (1, 1)
    +    assert decoded_image.mode == "RGB"
    +
    +
    +def test_file_uri_allowed_with_env_var():
    +    """Test that file:// URIs work when enabled via settings."""
    +    test_img_path = Path("/tmp/test_docling_env.png")
    +    img = PILImage.new("RGB", (100, 100), color="red")
    +    img.save(test_img_path)
    +
    +    orig_allow_image_file_uri = settings.allow_image_file_uri
    +    try:
    +        settings.allow_image_file_uri = True
    +
    +        image_ref = ImageRef(
    +            dpi=72,
    +            mimetype="image/png",
    +            size=Size(width=100, height=100),
    +            uri=AnyUrl(f"file://{test_img_path}"),
    +        )
    +
    +        pil_img = image_ref.pil_image
    +        assert pil_img is not None
    +        assert pil_img.size == (100, 100)
    +        assert pil_img.mode == "RGB"
    +    finally:
    +        test_img_path.unlink(missing_ok=True)
    +        settings.allow_image_file_uri = orig_allow_image_file_uri
    +
    +
    +def test_file_uri_blocked_by_default():
    +    """Test that file:// URIs are blocked by default."""
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl("file:///tmp/test.png"),
    +    )
    +
    +    with pytest.raises(ValueError, match="file:// URI scheme is not enabled"):
    +        _ = image_ref.pil_image
    +
    +
    +def test_max_decoded_size_custom():
    +    """Test that oversized images are rejected based on custom limit."""
    +    orig_max_image_decoded_size = settings.max_image_decoded_size
    +    try:
    +        settings.max_image_decoded_size = 100  # 100 bytes limit
    +
    +        # Create image that will exceed 100 bytes when base64 decoded
    +        # A 50x50 RGB image is 50*50*3 = 7500 bytes uncompressed
    +        img = PILImage.new("RGB", (50, 50), color="green")
    +        buffer = BytesIO()
    +        img.save(buffer, format="PNG")
    +        img_bytes = buffer.getvalue()
    +
    +        # Verify the decoded size will exceed our limit
    +        assert len(img_bytes) > 100, f"Test image is only {len(img_bytes)} bytes, need > 100"
    +
    +        encoded = base64.b64encode(img_bytes).decode("utf-8")
    +        data_uri = f"data:image/png;base64,{encoded}"
    +
    +        image_ref = ImageRef(
    +            dpi=72,
    +            mimetype="image/png",
    +            size=Size(width=50, height=50),
    +            uri=AnyUrl(data_uri),
    +        )
    +
    +        with pytest.raises(ValueError, match="Decoded image exceeds size limit"):
    +            _ = image_ref.pil_image
    +    finally:
    +        settings.max_image_decoded_size = orig_max_image_decoded_size
    +
    +def test_max_decoded_size_default():
    +    """Test that small images work with default 20MB limit."""
    +    img = PILImage.new("RGB", (100, 100), color="blue")
    +    buffer = BytesIO()
    +    img.save(buffer, format="PNG")
    +    img_bytes = buffer.getvalue()
    +
    +    encoded = base64.b64encode(img_bytes).decode("utf-8")
    +    data_uri = f"data:image/png;base64,{encoded}"
    +
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl(data_uri),
    +    )
    +
    +    pil_img = image_ref.pil_image
    +    assert pil_img is not None
    +    assert pil_img.size == (100, 100)
     
     
     def test_upgrade_content_layer_from_1_0_0() -> None:
    
  • uv.lock+25 0 modified
    @@ -965,6 +965,7 @@ dependencies = [
         { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
         { name = "pillow" },
         { name = "pydantic" },
    +    { name = "pydantic-settings" },
         { name = "pyyaml" },
         { name = "tabulate" },
         { name = "typer" },
    @@ -1036,6 +1037,7 @@ requires-dist = [
         { name = "pandas", specifier = ">=2.1.4,<4.0.0" },
         { name = "pillow", specifier = ">=10.0.0,<13.0.0" },
         { name = "pydantic", specifier = ">=2.6.0,!=2.10.0,!=2.10.1,!=2.10.2,<3.0.0" },
    +    { name = "pydantic-settings", specifier = ">=2.14.0" },
         { name = "pyyaml", specifier = ">=5.1,<7.0.0" },
         { name = "semchunk", marker = "extra == 'chunking'", specifier = ">=2.2.0,<4.0.0" },
         { name = "semchunk", marker = "extra == 'chunking-openai'", specifier = ">=2.2.0,<4.0.0" },
    @@ -3321,6 +3323,20 @@ wheels = [
         { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302, upload-time = "2025-11-04T13:43:46.64Z" },
     ]
     
    +[[package]]
    +name = "pydantic-settings"
    +version = "2.14.0"
    +source = { registry = "https://pypi.org/simple" }
    +dependencies = [
    +    { name = "pydantic" },
    +    { name = "python-dotenv" },
    +    { name = "typing-inspection" },
    +]
    +sdist = { url = "https://files.pythonhosted.org/packages/42/98/c8345dccdc31de4228c039a98f6467a941e39558da41c1744fbe29fa5666/pydantic_settings-2.14.0.tar.gz", hash = "sha256:24285fd4b0e0c06507dd9fdfd331ee23794305352aaec8fc4eb92d4047aeb67d", size = 235709, upload-time = "2026-04-20T13:37:40.293Z" }
    +wheels = [
    +    { url = "https://files.pythonhosted.org/packages/01/dd/bebff3040138f00ae8a102d426b27349b9a49acc310fcae7f92112d867e3/pydantic_settings-2.14.0-py3-none-any.whl", hash = "sha256:fc8d5d692eb7092e43c8647c1c35a3ecd00e040fcf02ed86f4cb5458ca62182e", size = 60940, upload-time = "2026-04-20T13:37:38.586Z" },
    +]
    +
     [[package]]
     name = "pydocstyle"
     version = "6.3.0"
    @@ -3417,6 +3433,15 @@ wheels = [
         { url = "https://files.pythonhosted.org/packages/e7/80/73211fc5bfbfc562369b4aa61dc1e4bf07dc7b34df7b317e4539316b809c/python_discovery-1.1.3-py3-none-any.whl", hash = "sha256:90e795f0121bc84572e737c9aa9966311b9fde44ffb88a5953b3ec9b31c6945e", size = 31485, upload-time = "2026-03-10T15:08:13.06Z" },
     ]
     
    +[[package]]
    +name = "python-dotenv"
    +version = "1.2.2"
    +source = { registry = "https://pypi.org/simple" }
    +sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135, upload-time = "2026-03-01T16:00:26.196Z" }
    +wheels = [
    +    { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101, upload-time = "2026-03-01T16:00:25.09Z" },
    +]
    +
     [[package]]
     name = "python-gitlab"
     version = "3.15.0"
    

Vulnerability mechanics

Root cause

"The docling-core library did not sufficiently validate remote request destinations, allowing a server-provided `Content-Disposition` header to specify a local path, leading to SSRF and path traversal."

Attack vector

An attacker can host a malicious document or resource that, when fetched by an application using `docling-core` versions `>= 1.5.0, < 2.74.1`, includes a `Content-Disposition` header pointing to a local file. This header is then processed by the `resolve_remote_filename` function, which fails to adequately sanitize the filename, allowing it to resolve to a path outside the intended directory. This enables Server-Side Request Forgery (SSRF) attacks targeting local files [CWE-918].

Affected code

The vulnerability resides in the `resolve_remote_filename` function within `docling_core/utils/file.py`, which processes the `Content-Disposition` header. The `resolve_source_to_stream` function in the same file is also affected as it uses `requests.get` without sufficient validation of the URL and its redirects. The `ImageRef` class in `docling_core/types/doc/document.py` was also updated to address related security concerns with URI handling [patch_id=4714020, patch_id=4714021].

What the fix does

The patch introduces stricter validation for remote destinations and normalizes server-provided filenames before use in `docling_core/utils/file.py`. Specifically, the `_sanitize_filename` function now correctly handles path traversal attempts by extracting only the base filename, and `_is_safe_url` prevents requests to non-public IP addresses. Additionally, redirect targets are now checked for safety, mitigating SSRF risks [patch_id=4714020]. The `ImageRef` class also received updates to prevent `file://` URIs by default and to enforce a maximum decoded image size, further hardening against malicious inputs [patch_id=4714021].

Preconditions

  • inputThe application must accept untrusted URLs for remote fetching.

Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

3

News mentions

1