Denial of Service via Uncontrolled Recursive JSON Parsing in JSONReader in run-llama/llama_index
Description
The JSONReader in run-llama/llama_index versions 0.12.28 is vulnerable to a stack overflow due to uncontrolled recursive JSON parsing. This vulnerability allows attackers to trigger a Denial of Service (DoS) by submitting deeply nested JSON structures, leading to a RecursionError and crashing applications. The root cause is the unsafe recursive traversal design and lack of depth validation, which makes the JSONReader susceptible to stack overflow when processing deeply nested JSON. This impacts the availability of services, making them unreliable and disrupting workflows. The issue is resolved in version 0.12.38.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
The JSONReader in llama_index 0.12.28 is vulnerable to stack overflow via deeply nested JSON, causing DoS; fixed in 0.12.38.
Vulnerability
Overview
The JSONReader component in run-llama/llama_index version 0.12.28 contains a stack overflow vulnerability due to uncontrolled recursive JSON parsing. The root cause is the unsafe recursive traversal design and lack of depth validation, which makes the JSONReader susceptible to stack overflow when processing deeply nested JSON structures [1][3]. This flaw can lead to a RecursionError, crashing the application.
Exploitation
An attacker can exploit this vulnerability by submitting a specially crafted, deeply nested JSON payload to the JSONReader. No authentication is required if the reader is exposed to untrusted input. The recursive parsing exhausts the call stack, causing a denial of service (DoS) condition [1][4].
Impact
Successful exploitation results in a denial of service, making the application unreliable and disrupting workflows. The vulnerability impacts the availability of services that rely on the JSONReader for parsing JSON data [1].
Mitigation
The issue is resolved in version 0.12.38 of llama_index. Users are strongly advised to upgrade to this patched version or later to prevent potential DoS attacks [1][3].
AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
llama-index-corePyPI | < 0.12.38 | 0.12.38 |
Affected products
2- Range: <=0.12.37
- run-llama/run-llama/llama_indexv5Range: unspecified
Patches
1c032843a02cefix: prevent DoS attacks in JSONReader (#18877)
2 files changed · +85 −44
llama-index-core/llama_index/core/readers/json.py+53 −44 modified@@ -2,6 +2,7 @@ import json import re +import warnings from typing import Any, Dict, Generator, List, Optional from llama_index.core.readers.base import BaseReader @@ -97,49 +98,57 @@ def load_data( self, input_file: str, extra_info: Optional[Dict] = {} ) -> List[Document]: """Load data from the input file.""" - with open(input_file, encoding="utf-8") as f: - load_data = [] - if self.is_jsonl: - for line in f: - load_data.append(json.loads(line.strip())) - else: - load_data = [json.load(f)] - - documents = [] - for data in load_data: - if self.levels_back is None and self.clean_json is True: - # If levels_back isn't set and clean json is set, - # remove lines containing only formatting, we just format and make each - # line an embedding - json_output = json.dumps( - data, indent=0, ensure_ascii=self.ensure_ascii - ) - lines = json_output.split("\n") - useful_lines = [ - line for line in lines if not re.match(r"^[{}\[\],]*$", line) - ] - documents.append( - Document(text="\n".join(useful_lines), metadata=extra_info) - ) - - elif self.levels_back is None and self.clean_json is False: - # If levels_back isn't set and clean json is False, create documents without cleaning - json_output = json.dumps(data, ensure_ascii=self.ensure_ascii) - documents.append(Document(text=json_output, metadata=extra_info)) - - elif self.levels_back is not None: - # If levels_back is set, we make the embeddings contain the labels - # from further up the JSON tree - lines = [ - *_depth_first_yield( - data, - self.levels_back, - self.collapse_length, - [], - self.ensure_ascii, + try: + with open(input_file, encoding="utf-8") as f: + load_data = [] + if self.is_jsonl: + for line in f: + load_data.append(json.loads(line.strip())) + else: + load_data = [json.load(f)] + + documents = [] + for data in load_data: + if self.levels_back is None and self.clean_json is True: + # If levels_back isn't set and clean json is set, + # remove lines containing only formatting, we just format and make each + # line an embedding + json_output = json.dumps( + data, indent=0, ensure_ascii=self.ensure_ascii + ) + lines = json_output.split("\n") + useful_lines = [ + line + for line in lines + if not re.match(r"^[{}\[\],]*$", line) + ] + documents.append( + Document(text="\n".join(useful_lines), metadata=extra_info) + ) + + elif self.levels_back is None and self.clean_json is False: + # If levels_back isn't set and clean json is False, create documents without cleaning + json_output = json.dumps(data, ensure_ascii=self.ensure_ascii) + documents.append( + Document(text=json_output, metadata=extra_info) + ) + + elif self.levels_back is not None: + # If levels_back is set, we make the embeddings contain the labels + # from further up the JSON tree + lines = [ + *_depth_first_yield( + data, + self.levels_back, + self.collapse_length, + [], + self.ensure_ascii, + ) + ] + documents.append( + Document(text="\n".join(lines), metadata=extra_info) ) - ] - documents.append( - Document(text="\n".join(lines), metadata=extra_info) - ) return documents + except RecursionError: + warnings.warn("Recursion error occurred while processing JSON data.") + return []
llama-index-core/tests/readers/test_json.py+32 −0 modified@@ -1,7 +1,10 @@ """Test file reader.""" +import json +import sys from tempfile import TemporaryDirectory +import pytest from llama_index.core.readers.json import JSONReader @@ -93,3 +96,32 @@ def test_clean_json() -> None: reader1 = JSONReader(clean_json=True) data1 = reader1.load_data(file_name) assert data1[0].get_content() == '"a": {\n"b": "c"' + + +def test_max_recursion_attack(tmp_path): + original_limit = sys.getrecursionlimit() + try: + nested_dict = {} + current_level = nested_dict + sys.setrecursionlimit(5000) + + for i in range(1, 2001): # Create 2000 levels of nesting + if i == 2000: + current_level[f"level{i}"] = "final_value" + else: + current_level[f"level{i}"] = {} + current_level = current_level[f"level{i}"] + + file_name = tmp_path / "test_nested.json" + with open(file_name, "w") as f: + f.write(json.dumps(nested_dict)) + + # Force a recursion error + sys.setrecursionlimit(500) + reader = JSONReader(levels_back=1) + with pytest.warns(UserWarning): + data = reader.load_data(file_name) + assert data == [] + + finally: + sys.setrecursionlimit(original_limit)
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.