VYPR
High severity8.1NVD Advisory· Published Jun 3, 2026· Updated Jun 3, 2026

Docling Core: Insufficient validation of image reference URIs

CVE-2026-44019

Description

Impact

In versions >= 2.5.0, < 2.74.1, docling-core could allow local file:// image references and accepted inline data: content without a decoded-size limit.

In applications that accept untrusted image references, this may allow access to local files readable by the process or excessive memory use from large inline payloads.

Patches

Patched in docling-core 2.74.1. The fix blocks local file URIs by default and adds a size limit for decoded inline image data.

Users should upgrade to: - docling-core >= 2.74.1

Workarounds

If upgrading is not immediately possible: - reject file: and data: image references from untrusted input - allow only approved local or remote image sources - apply input size and memory limits to processing workers

### References - Fix release: `v2.74.1`

Affected products

2

Patches

1
2087d0f36261

fix: refine ImageRef URI handling (#595)

https://github.com/docling-project/docling-corePanos VagenasApr 22, 2026Fixed in 2.74.1via llm-release-walk
5 files changed · +202 5
  • docling_core/types/doc/document.py+10 3 modified
    @@ -66,6 +66,7 @@
     )
     from docling_core.types.doc.tokens import DocumentToken, TableToken
     from docling_core.types.doc.utils import parse_otsl_table_content, relative_path
    +from docling_core.utils.settings import settings
     
     _logger = logging.getLogger(__name__)
     
    @@ -1086,12 +1087,18 @@ def pil_image(self) -> Optional[PILImage.Image]:
                 return self._pil
     
             if isinstance(self.uri, AnyUrl):
    -            if self.uri.scheme == "data":
    +            if self.uri.scheme == "file":
    +                if not settings.allow_image_file_uri:
    +                    raise ValueError("file:// URI scheme is not enabled.")
    +                self._pil = PILImage.open(unquote(str(self.uri.path)))
    +            elif self.uri.scheme == "data":
                     encoded_img = str(self.uri).split(",")[1]
                     decoded_img = base64.b64decode(encoded_img)
    +
    +                if len(decoded_img) > settings.max_image_decoded_size:
    +                    raise ValueError(f"Decoded image exceeds size limit of {settings.max_image_decoded_size} bytes.")
    +
                     self._pil = PILImage.open(BytesIO(decoded_img))
    -            elif self.uri.scheme == "file":
    -                self._pil = PILImage.open(unquote(str(self.uri.path)))
                 # else: Handle http request or other protocols...
             elif isinstance(self.uri, Path):
                 self._pil = PILImage.open(self.uri)
    
  • docling_core/utils/settings.py+11 0 added
    @@ -0,0 +1,11 @@
    +from pydantic_settings import BaseSettings, SettingsConfigDict
    +
    +
    +class CoreSettings(BaseSettings):
    +    model_config = SettingsConfigDict(env_prefix="DOCLINGCORE_")
    +
    +    allow_image_file_uri: bool = False
    +    max_image_decoded_size: int = 20 * 1024 * 1024  # 20MB
    +
    +
    +settings = CoreSettings()
    
  • pyproject.toml+1 0 modified
    @@ -50,6 +50,7 @@ dependencies = [
         'typer (>=0.12.5,<0.25.0)',
         'latex2mathml (>=3.77.0,<4.0.0)',
         "defusedxml (>=0.7.1, <0.8.0)",
    +    "pydantic-settings>=2.14.0",
     ]
     
     [project.urls]
    
  • test/test_docling_doc.py+155 2 modified
    @@ -6,7 +6,8 @@
     from pathlib import Path
     from typing import Optional, Union
     from unittest.mock import Mock
    -
    +from io import BytesIO
    +import base64
     import pytest
     import yaml
     from PIL import Image as PILImage
    @@ -51,6 +52,7 @@
     from docling_core.types.doc.document import FieldHeadingItem, FieldItem, FieldRegionItem, FieldValueItem
     from docling_core.types.doc.document import CURRENT_VERSION, PageItem
     from docling_core.types.doc.webvtt import WebVTTFile
    +from docling_core.utils.settings import settings
     
     from .test_data_gen_flag import GEN_TEST_DATA
     
    @@ -795,7 +797,158 @@ def test_image_ref():
         }
         image = ImageRef.model_validate(data_path)
         assert isinstance(image.uri, Path)
    -    assert image.uri.name == "image.png"
    +
    +
    +def test_image_ref_blocks_file_scheme():
    +    """Test that file:// URI scheme is blocked."""
    +    fig_image = PILImage.new(mode="RGB", size=(2, 2), color=(0, 0, 0))
    +    image_ref = ImageRef.from_pil(image=fig_image, dpi=72)
    +
    +    image_ref.uri = AnyUrl("file:///tmp/test.png")
    +
    +    with pytest.raises(ValueError, match="file:// URI scheme is not enabled"):
    +        _ = image_ref.pil_image
    +
    +
    +def test_image_ref_blocks_oversized_base64():
    +    """Test that oversized base64 data URIs are blocked."""
    +    import base64
    +
    +    large_bytes = b"X" * (28 * 1024 * 1024)
    +    large_data = base64.b64encode(large_bytes).decode('ascii')
    +    data_uri = f"data:image/png;base64,{large_data}"
    +
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl(data_uri)
    +    )
    +
    +    with pytest.raises(ValueError, match="exceeds size limit"):
    +        _ = image_ref.pil_image
    +
    +
    +
    +def test_image_ref_accepts_valid_base64():
    +    """Test that valid base64 data URIs within size limit work correctly."""
    +    import base64
    +    from io import BytesIO
    +
    +    fig_image = PILImage.new(mode="RGB", size=(1, 1), color=(255, 0, 0))
    +
    +    # Convert to base64 data URI
    +    buffer = BytesIO()
    +    fig_image.save(buffer, format="PNG")
    +    img_bytes = buffer.getvalue()
    +    img_base64 = base64.b64encode(img_bytes).decode('ascii')
    +    data_uri = f"data:image/png;base64,{img_base64}"
    +
    +    # Create ImageRef with data URI
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=1, height=1),
    +        uri=AnyUrl(data_uri)
    +    )
    +
    +    # Should successfully decode the image
    +    decoded_image = image_ref.pil_image
    +    assert isinstance(decoded_image, PILImage.Image)
    +    assert decoded_image.size == (1, 1)
    +    assert decoded_image.mode == "RGB"
    +
    +
    +def test_file_uri_allowed_with_env_var():
    +    """Test that file:// URIs work when enabled via settings."""
    +    test_img_path = Path("/tmp/test_docling_env.png")
    +    img = PILImage.new("RGB", (100, 100), color="red")
    +    img.save(test_img_path)
    +
    +    orig_allow_image_file_uri = settings.allow_image_file_uri
    +    try:
    +        settings.allow_image_file_uri = True
    +
    +        image_ref = ImageRef(
    +            dpi=72,
    +            mimetype="image/png",
    +            size=Size(width=100, height=100),
    +            uri=AnyUrl(f"file://{test_img_path}"),
    +        )
    +
    +        pil_img = image_ref.pil_image
    +        assert pil_img is not None
    +        assert pil_img.size == (100, 100)
    +        assert pil_img.mode == "RGB"
    +    finally:
    +        test_img_path.unlink(missing_ok=True)
    +        settings.allow_image_file_uri = orig_allow_image_file_uri
    +
    +
    +def test_file_uri_blocked_by_default():
    +    """Test that file:// URIs are blocked by default."""
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl("file:///tmp/test.png"),
    +    )
    +
    +    with pytest.raises(ValueError, match="file:// URI scheme is not enabled"):
    +        _ = image_ref.pil_image
    +
    +
    +def test_max_decoded_size_custom():
    +    """Test that oversized images are rejected based on custom limit."""
    +    orig_max_image_decoded_size = settings.max_image_decoded_size
    +    try:
    +        settings.max_image_decoded_size = 100  # 100 bytes limit
    +
    +        # Create image that will exceed 100 bytes when base64 decoded
    +        # A 50x50 RGB image is 50*50*3 = 7500 bytes uncompressed
    +        img = PILImage.new("RGB", (50, 50), color="green")
    +        buffer = BytesIO()
    +        img.save(buffer, format="PNG")
    +        img_bytes = buffer.getvalue()
    +
    +        # Verify the decoded size will exceed our limit
    +        assert len(img_bytes) > 100, f"Test image is only {len(img_bytes)} bytes, need > 100"
    +
    +        encoded = base64.b64encode(img_bytes).decode("utf-8")
    +        data_uri = f"data:image/png;base64,{encoded}"
    +
    +        image_ref = ImageRef(
    +            dpi=72,
    +            mimetype="image/png",
    +            size=Size(width=50, height=50),
    +            uri=AnyUrl(data_uri),
    +        )
    +
    +        with pytest.raises(ValueError, match="Decoded image exceeds size limit"):
    +            _ = image_ref.pil_image
    +    finally:
    +        settings.max_image_decoded_size = orig_max_image_decoded_size
    +
    +def test_max_decoded_size_default():
    +    """Test that small images work with default 20MB limit."""
    +    img = PILImage.new("RGB", (100, 100), color="blue")
    +    buffer = BytesIO()
    +    img.save(buffer, format="PNG")
    +    img_bytes = buffer.getvalue()
    +
    +    encoded = base64.b64encode(img_bytes).decode("utf-8")
    +    data_uri = f"data:image/png;base64,{encoded}"
    +
    +    image_ref = ImageRef(
    +        dpi=72,
    +        mimetype="image/png",
    +        size=Size(width=100, height=100),
    +        uri=AnyUrl(data_uri),
    +    )
    +
    +    pil_img = image_ref.pil_image
    +    assert pil_img is not None
    +    assert pil_img.size == (100, 100)
     
     
     def test_upgrade_content_layer_from_1_0_0() -> None:
    
  • uv.lock+25 0 modified
    @@ -965,6 +965,7 @@ dependencies = [
         { name = "pandas", version = "3.0.1", source = { registry = "https://pypi.org/simple" }, marker = "python_full_version >= '3.11'" },
         { name = "pillow" },
         { name = "pydantic" },
    +    { name = "pydantic-settings" },
         { name = "pyyaml" },
         { name = "tabulate" },
         { name = "typer" },
    @@ -1036,6 +1037,7 @@ requires-dist = [
         { name = "pandas", specifier = ">=2.1.4,<4.0.0" },
         { name = "pillow", specifier = ">=10.0.0,<13.0.0" },
         { name = "pydantic", specifier = ">=2.6.0,!=2.10.0,!=2.10.1,!=2.10.2,<3.0.0" },
    +    { name = "pydantic-settings", specifier = ">=2.14.0" },
         { name = "pyyaml", specifier = ">=5.1,<7.0.0" },
         { name = "semchunk", marker = "extra == 'chunking'", specifier = ">=2.2.0,<4.0.0" },
         { name = "semchunk", marker = "extra == 'chunking-openai'", specifier = ">=2.2.0,<4.0.0" },
    @@ -3321,6 +3323,20 @@ wheels = [
         { url = "https://files.pythonhosted.org/packages/36/c7/cfc8e811f061c841d7990b0201912c3556bfeb99cdcb7ed24adc8d6f8704/pydantic_core-2.41.5-pp311-pypy311_pp73-win_amd64.whl", hash = "sha256:56121965f7a4dc965bff783d70b907ddf3d57f6eba29b6d2e5dabfaf07799c51", size = 2145302, upload-time = "2025-11-04T13:43:46.64Z" },
     ]
     
    +[[package]]
    +name = "pydantic-settings"
    +version = "2.14.0"
    +source = { registry = "https://pypi.org/simple" }
    +dependencies = [
    +    { name = "pydantic" },
    +    { name = "python-dotenv" },
    +    { name = "typing-inspection" },
    +]
    +sdist = { url = "https://files.pythonhosted.org/packages/42/98/c8345dccdc31de4228c039a98f6467a941e39558da41c1744fbe29fa5666/pydantic_settings-2.14.0.tar.gz", hash = "sha256:24285fd4b0e0c06507dd9fdfd331ee23794305352aaec8fc4eb92d4047aeb67d", size = 235709, upload-time = "2026-04-20T13:37:40.293Z" }
    +wheels = [
    +    { url = "https://files.pythonhosted.org/packages/01/dd/bebff3040138f00ae8a102d426b27349b9a49acc310fcae7f92112d867e3/pydantic_settings-2.14.0-py3-none-any.whl", hash = "sha256:fc8d5d692eb7092e43c8647c1c35a3ecd00e040fcf02ed86f4cb5458ca62182e", size = 60940, upload-time = "2026-04-20T13:37:38.586Z" },
    +]
    +
     [[package]]
     name = "pydocstyle"
     version = "6.3.0"
    @@ -3417,6 +3433,15 @@ wheels = [
         { url = "https://files.pythonhosted.org/packages/e7/80/73211fc5bfbfc562369b4aa61dc1e4bf07dc7b34df7b317e4539316b809c/python_discovery-1.1.3-py3-none-any.whl", hash = "sha256:90e795f0121bc84572e737c9aa9966311b9fde44ffb88a5953b3ec9b31c6945e", size = 31485, upload-time = "2026-03-10T15:08:13.06Z" },
     ]
     
    +[[package]]
    +name = "python-dotenv"
    +version = "1.2.2"
    +source = { registry = "https://pypi.org/simple" }
    +sdist = { url = "https://files.pythonhosted.org/packages/82/ed/0301aeeac3e5353ef3d94b6ec08bbcabd04a72018415dcb29e588514bba8/python_dotenv-1.2.2.tar.gz", hash = "sha256:2c371a91fbd7ba082c2c1dc1f8bf89ca22564a087c2c287cd9b662adde799cf3", size = 50135, upload-time = "2026-03-01T16:00:26.196Z" }
    +wheels = [
    +    { url = "https://files.pythonhosted.org/packages/0b/d7/1959b9648791274998a9c3526f6d0ec8fd2233e4d4acce81bbae76b44b2a/python_dotenv-1.2.2-py3-none-any.whl", hash = "sha256:1d8214789a24de455a8b8bd8ae6fe3c6b69a5e3d64aa8a8e5d68e694bbcb285a", size = 22101, upload-time = "2026-03-01T16:00:25.09Z" },
    +]
    +
     [[package]]
     name = "python-gitlab"
     version = "3.15.0"
    

Vulnerability mechanics

Root cause

"The image processing component did not properly validate URIs and lacked a size limit for inline data."

Attack vector

In versions `>= 2.5.0, < 2.74.1`, `docling-core` accepted `file://` URI schemes for images and allowed inline `data:` content without a decoded-size limit. Applications that process untrusted image references are vulnerable. An attacker could provide a `file://` URI pointing to a local file readable by the application, or a large `data:` URI payload to consume excessive memory.

Affected code

The vulnerability resides in the `pil_image` method within `docling_core/types/doc/document.py`. This method handles image loading from various URI schemes, including `data:` and `file:`, without sufficient validation or size constraints.

What the fix does

The patch updates `docling-core` to version `2.74.1`, introducing configuration options for URI handling and decoded image size. By default, `file://` URI schemes are now blocked, and a size limit is enforced for decoded inline image data. This prevents unauthorized access to local files and mitigates excessive memory consumption from large payloads [patch_id=4714022].

Preconditions

  • inputThe application must accept untrusted image references.

Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

3

News mentions

1