Arbitrary File Read through Path Traversal in run-llama/llama_index
Description
A path traversal vulnerability exists in run-llama/llama_index versions 0.12.27 through 0.12.40, specifically within the encode_image function in generic_utils.py. This vulnerability allows an attacker to manipulate the image_path input to read arbitrary files on the server, including sensitive system files. The issue arises due to improper validation or sanitization of the file path, enabling path traversal sequences to access files outside the intended directory. The vulnerability is fixed in version 0.12.41.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Path traversal in llama_index's encode_image allows arbitrary file read; fixed in 0.12.41.
Vulnerability
Overview
A path traversal vulnerability exists in the encode_image function within generic_utils.py of run-llama/llama_index versions 0.12.27 through 0.12.40. The function fails to properly validate or sanitize the image_path input, allowing an attacker to supply path traversal sequences (e.g., ../) to access files outside the intended directory [1]. This flaw enables reading arbitrary files on the server, including sensitive system files.
Exploitation
An attacker can exploit this vulnerability by providing a crafted image_path argument to the encode_image function. No authentication is required if the function is exposed to user input, such as through an API endpoint that accepts image paths. The attack surface includes any application built with llama_index that passes untrusted file paths to this function. The traversal sequences bypass directory restrictions, allowing the attacker to read files like /etc/passwd or application configuration files [1][4].
Impact
Successful exploitation leads to unauthorized disclosure of sensitive information. An attacker could read system files, application secrets, or other data stored on the server, potentially leading to further compromise. The vulnerability is classified as a path traversal (CWE-22) and has a CVSS v4.0 score pending, but the impact is considered high due to the potential for information disclosure [1].
Mitigation
The vulnerability is fixed in llama_index version 0.12.41. Users are strongly advised to upgrade to this version or later. The fix, introduced in commit cdeaab91a204d1c3527f177dac37390327aef274, adds proper validation of image paths and URLs [3]. No workaround is documented; upgrading is the recommended action [4].
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
llama-index-corePyPI | >= 0.11.23, < 0.12.41 | 0.12.41 |
Affected products
2- Range: >=0.12.27, <=0.12.40
- run-llama/run-llama/llama_indexv5Range: unspecified
Patches
1cdeaab91a204ImageDocument path and url checking to ensure that the input is really an image (#18947)
4 files changed · +113 −23
llama-index-core/llama_index/core/schema.py+26 −0 modified@@ -32,6 +32,7 @@ from dataclasses_json import DataClassJsonMixin from deprecated import deprecated from typing_extensions import Self +from PIL import Image from llama_index.core.bridge.pydantic import ( AnyUrl, @@ -1220,6 +1221,27 @@ def from_cloud_document( ) +def is_image_pil(file_path: str) -> bool: + try: + with Image.open(file_path) as img: + img.verify() # Verify it's a valid image + return True + except (IOError, SyntaxError): + return False + + +def is_image_url_pil(url: str) -> bool: + try: + response = requests.get(url, stream=True) + response.raise_for_status() # Raise an exception for bad status codes + # Open image from the response content + img = Image.open(BytesIO(response.content)) + img.verify() + return True + except (requests.RequestException, IOError, SyntaxError): + return False + + class ImageDocument(Document): """Backward compatible wrapper around Document containing an image.""" @@ -1235,10 +1257,14 @@ def __init__(self, **kwargs: Any) -> None: data=image, mimetype=image_mimetype ) elif image_path: + if not is_image_pil(image_path): + raise ValueError("The specified file path is not an accessible image") kwargs["image_resource"] = MediaResource( path=image_path, mimetype=image_mimetype ) elif image_url: + if not is_image_url_pil(image_url): + raise ValueError("The specified URL is not an accessible image") kwargs["image_resource"] = MediaResource( url=image_url, mimetype=image_mimetype )
llama-index-core/tests/multi_modal_llms/test_generic_utils.py+23 −11 modified@@ -2,6 +2,8 @@ import pytest import base64 +import httpx +from pathlib import Path from unittest.mock import mock_open, patch, MagicMock from llama_index.core.schema import ImageDocument @@ -16,7 +18,9 @@ ) # Expected values -EXP_IMAGE_URLS = ["http://example.com/image1.jpg"] +EXP_IMAGE_URLS = [ + "https://astrabert.github.io/hophop-science/images/whale_doing_science.png" +] EXP_BASE64 = "SGVsbG8gV29ybGQ=" # "Hello World" in base64 EXP_BINARY = b"Hello World" @@ -51,22 +55,26 @@ def test_encode_image(): assert result == EXP_BASE64 -def test_image_documents_to_base64_multiple_sources(): +def test_image_documents_to_base64_multiple_sources(tmp_path: Path): """Test converting multiple ImageDocuments with different source types.""" + content = httpx.get(EXP_IMAGE_URLS[0]).content + expected_b64 = base64.b64encode(content).decode("utf-8") + fl_path = tmp_path / "test_image.png" + fl_path.write_bytes(content) documents = [ - ImageDocument(image=EXP_BASE64), - ImageDocument(image_path="test.jpg"), + ImageDocument(image=expected_b64), + ImageDocument(image_path=fl_path), ImageDocument(metadata={"file_path": "test.jpg"}), ImageDocument(image_url=EXP_IMAGE_URLS[0]), ] with patch("requests.get") as mock_get: - mock_get.return_value.content = EXP_BINARY + mock_get.return_value.content = content with patch("os.path.isfile", return_value=True): - with patch("builtins.open", mock_open(read_data=EXP_BINARY)): + with patch("builtins.open", mock_open(read_data=content)): result = image_documents_to_base64(documents) assert len(result) == 4 - assert all(encoding == EXP_BASE64 for encoding in result) + assert all(encoding == expected_b64 for encoding in result) def test_image_documents_to_base64_failed_url(): @@ -136,11 +144,15 @@ def test_infer_image_mimetype_from_file_path(): assert infer_image_mimetype_from_file_path("") == "image/jpeg" -def test_set_base64_and_mimetype_for_image_docs(): +def test_set_base64_and_mimetype_for_image_docs(tmp_path: Path): """Test setting base64 and mimetype fields for ImageDocument objects.""" + content = httpx.get(EXP_IMAGE_URLS[0]).content + expected_b64 = base64.b64encode(content).decode("utf-8") + fl_path = tmp_path / "test_image.png" + fl_path.write_bytes(content) image_docs = [ - ImageDocument(image=EXP_BASE64), - ImageDocument(image_path="test.asdf"), + ImageDocument(image=expected_b64), + ImageDocument(image_path=fl_path.__str__()), ] with patch("requests.get") as mock_get: @@ -151,6 +163,6 @@ def test_set_base64_and_mimetype_for_image_docs(): results = set_base64_and_mimetype_for_image_docs(image_docs) assert len(results) == 2 - assert results[0].image == EXP_BASE64 + assert results[0].image == expected_b64 assert results[0].image_mimetype == "image/jpeg" assert results[1].image_mimetype == "image/jpeg"
llama-index-core/tests/schema/test_image_document.py+36 −0 added@@ -0,0 +1,36 @@ +import httpx +import pytest + +from pathlib import Path +from llama_index.core.schema import ImageDocument + + +@pytest.fixture() +def image_url() -> str: + return "https://astrabert.github.io/hophop-science/images/whale_doing_science.png" + + +def test_real_image_path(tmp_path: Path, image_url: str) -> None: + content = httpx.get(image_url).content + fl_path = tmp_path / "test_image.png" + fl_path.write_bytes(content) + doc = ImageDocument(image_path=fl_path.__str__()) + assert isinstance(doc, ImageDocument) + + +def test_real_image_url(image_url: str) -> None: + doc = ImageDocument(image_url=image_url) + assert isinstance(doc, ImageDocument) + + +def test_non_image_path(tmp_path: Path) -> None: + fl_path = tmp_path / "test_file.txt" + fl_path.write_text("Hello world!") + with pytest.raises(expected_exception=ValueError): + doc = ImageDocument(image_path=fl_path.__str__()) + + +def test_non_image_url(image_url: str) -> None: + image_url = image_url.replace("png", "txt") + with pytest.raises(expected_exception=ValueError): + doc = ImageDocument(image_url=image_url)
llama-index-core/tests/schema/test_schema.py+28 −12 modified@@ -1,4 +1,5 @@ import base64 +import httpx import logging from io import BytesIO from pathlib import Path @@ -250,17 +251,27 @@ def test_image_document_image(): assert doc.image == "MTIzNDU2Nzg5" -def test_image_document_path(): - mock_path = Path(__file__) - doc = ImageDocument(id_="test", image_path=mock_path) - assert doc.image_path == str(mock_path) - doc.image_path = str(mock_path.parent) - assert doc.image_path == str(mock_path.parent) +def test_image_document_path(tmp_path: Path): + content = httpx.get( + "https://astrabert.github.io/hophop-science/images/whale_doing_science.png" + ).content + fl_path = tmp_path / "test_image.png" + fl_path.write_bytes(content) + doc = ImageDocument(id_="test", image_path=fl_path) + assert doc.image_path == str(fl_path) + doc.image_path = str(fl_path.parent) + assert doc.image_path == str(fl_path.parent) def test_image_document_url(): - doc = ImageDocument(id_="test", image_url="https://example.com/") - assert doc.image_url == "https://example.com/" + doc = ImageDocument( + id_="test", + image_url="https://astrabert.github.io/hophop-science/images/whale_doing_science.png", + ) + assert ( + doc.image_url + == "https://astrabert.github.io/hophop-science/images/whale_doing_science.png" + ) doc.image_url = "https://foo.org" assert doc.image_url == "https://foo.org/" @@ -281,12 +292,17 @@ def test_image_document_embeddings(): assert doc.text_resource.embeddings == {"dense": [1.0, 2.0, 3.0]} -def test_image_document_path_serialization(): - doc = ImageDocument(image_path=Path("test.png")) - assert doc.model_dump()["image_resource"]["path"] == "test.png" +def test_image_document_path_serialization(tmp_path: Path): + content = httpx.get( + "https://astrabert.github.io/hophop-science/images/whale_doing_science.png" + ).content + fl_path = tmp_path / "test_image.png" + fl_path.write_bytes(content) + doc = ImageDocument(image_path=fl_path) + assert doc.model_dump()["image_resource"]["path"] == fl_path.__str__() new_doc = ImageDocument(**doc.model_dump()) - assert new_doc.image_resource.path == Path("test.png") + assert new_doc.image_resource.path == fl_path def test_image_block_resolve_image(png_1px: bytes, png_1px_b64: bytes):
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
5- github.com/advisories/GHSA-2rhq-96q8-4vjqghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2025-6209ghsaADVISORY
- github.com/pypa/advisory-database/tree/main/vulns/llama-index/PYSEC-2025-65.yamlghsaWEB
- github.com/run-llama/llama_index/commit/cdeaab91a204d1c3527f177dac37390327aef274ghsaWEB
- huntr.com/bounties/e89d14f8-bfe8-4c9a-bb2a-656c01cc9a68ghsaWEB
News mentions
0No linked articles in our index yet.