High severityNVD Advisory· Published Jul 7, 2025· Updated Jul 7, 2025

XML Entity Expansion vulnerability in run-llama/llama_index

CVE-2025-3225

Description

An XML Entity Expansion vulnerability, also known as a 'billion laughs' attack, exists in the sitemap parser of the run-llama/llama_index repository, specifically affecting version v0.12.21. This vulnerability allows an attacker to supply a malicious Sitemap XML, leading to a Denial of Service (DoS) by exhausting system memory and potentially causing a system crash. The issue is resolved in version v0.12.29.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

XML Entity Expansion vulnerability in llama_index sitemap parser allows DoS via malicious sitemap; fixed in v0.12.29.

CVE-2025-3225 is an XML Entity Expansion (billion laughs) vulnerability in the sitemap parser of llama_index (run-llama/llama_index) version v0.12.21 [1]. This vulnerability occurs because the parser uses xml.etree.ElementTree without protections against entity expansion, allowing an attacker to craft a sitemap XML that causes exponential entity expansion, leading to memory exhaustion [1].

The attack can be executed remotely by supplying a malicious Sitemap XML file to the affected parser [1]. No authentication is required; the attacker only needs to provide the malicious XML to a service that parses sitemaps using llama_index's vulnerable parser [1].

The impact is a Denial of Service (DoS) attack that can exhaust system memory and potentially crash the system [1]. This can disrupt services relying on llama_index for document ingestion or indexing.

The issue is resolved in version v0.12.29 [1]. The fix replaces xml.etree.ElementTree with defusedxml.ElementTree, which guards against entity expansion attacks [3]. Users are advised to upgrade to v0.12.29 or later to mitigate this vulnerability [1].

References

AI Insight generated on May 19, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

Package	Affected versions	Patched versions
llama-index-readers-papersPyPI	< 0.3.2	0.3.2

Affected products

Run Llama/Llama Indexllm-fuzzy
Range: = v0.12.21
run-llama/run-llama/llama_indexv5
Range: unspecified

Patches

4f6ee062b192

fix: use defusexml instead of xml.etree (#18362)

https://github.com/run-llama/llama_indexMassimiliano PippiApr 3, 2025via ghsa

commit

6 files changed · +26 −18

llama-index-integrations/readers/llama-index-readers-papers/llama_index/readers/papers/pubmed/base.py+11 −8 modified

@@ -2,12 +2,14 @@
 
 from typing import List, Optional
 
+from defusedxml import ElementTree as safe_xml
 from llama_index.core.readers.base import BaseReader
 from llama_index.core.schema import Document
 
 
 class PubmedReader(BaseReader):
-    """Pubmed Reader.
+    """
+    Pubmed Reader.
 
     Gets a search query, return a list of Documents of the top corresponding scientific papers on Pubmed.
     """
@@ -17,7 +19,8 @@ def load_data_bioc(
         search_query: str,
         max_results: Optional[int] = 10,
     ) -> List[Document]:
-        """Search for a topic on Pubmed, fetch the text of the most relevant full-length papers.
+        """
+        Search for a topic on Pubmed, fetch the text of the most relevant full-length papers.
         Uses the BoiC API, which has been down a lot.
 
         Args:
@@ -27,10 +30,10 @@ def load_data_bioc(
         Returns:
             List[Document]: A list of Document objects.
         """
-        import xml.etree.ElementTree as xml
         from datetime import datetime
 
         import requests
+        from defusedxml import ElementTree as safe_xml
 
         pubmed_search = []
         parameters = {"tool": "tool", "email": "email", "db": "pmc"}
@@ -40,7 +43,7 @@ def load_data_bioc(
             "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
             params=parameters,
         )
-        root = xml.fromstring(resp.content)
+        root = safe_xml.fromstring(resp.content)
 
         for elem in root.iter():
             if elem.tag == "Id":
@@ -99,7 +102,8 @@ def load_data(
         search_query: str,
         max_results: Optional[int] = 10,
     ) -> List[Document]:
-        """Search for a topic on Pubmed, fetch the text of the most relevant full-length papers.
+        """
+        Search for a topic on Pubmed, fetch the text of the most relevant full-length papers.
 
         Args:
             search_query (str): A topic to search for (e.g. "Alzheimers").
@@ -110,7 +114,6 @@ def load_data(
             List[Document]: A list of Document objects.
         """
         import time
-        import xml.etree.ElementTree as xml
 
         import requests
 
@@ -122,7 +125,7 @@ def load_data(
             "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi",
             params=parameters,
         )
-        root = xml.fromstring(resp.content)
+        root = safe_xml.fromstring(resp.content)
 
         for elem in root.iter():
             if elem.tag == "Id":
@@ -131,7 +134,7 @@ def load_data(
                 print(url)
                 try:
                     resp = requests.get(url)
-                    info = xml.fromstring(resp.content)
+                    info = safe_xml.fromstring(resp.content)
 
                     raw_text = ""
                     title = ""

llama-index-integrations/readers/llama-index-readers-papers/pyproject.toml+2 −1 modified

@@ -29,12 +29,13 @@ license = "MIT"
 maintainers = ["thejessezhang"]
 name = "llama-index-readers-papers"
 readme = "README.md"
-version = "0.3.1"
+version = "0.3.2"
 
 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
 arxiv = "^2.1.0"
 llama-index-core = "^0.12.0"
+defusedxml = "^0.7.1"
 
 [tool.poetry.group.dev.dependencies]
 ipython = "8.10.0"

llama-index-integrations/readers/llama-index-readers-stripe-docs/llama_index/readers/stripe_docs/base.py+5 −4 modified

@@ -1,7 +1,7 @@
 import urllib.request
-import xml.etree.ElementTree as ET
 from typing import List
 
+from defusedxml.ElementTree import fromstring
 from llama_index.core.readers.base import BaseReader
 from llama_index.core.schema import Document
 from llama_index.readers.web import AsyncWebPageReader
@@ -13,7 +13,8 @@
 
 
 class StripeDocsReader(BaseReader):
-    """Asynchronous Stripe documentation reader.
+    """
+    Asynchronous Stripe documentation reader.
 
     Reads pages from the Stripe documentation based on the sitemap.xml.
 
@@ -36,7 +37,7 @@ def _load_sitemap(self) -> str:
     def _parse_sitemap(
         self, raw_sitemap: str, filters: List[str] = DEFAULT_FILTERS
     ) -> List:
-        root_sitemap = ET.fromstring(raw_sitemap)
+        root_sitemap = fromstring(raw_sitemap)
         sitemap_partition_urls = []
         sitemap_urls = []
 
@@ -45,7 +46,7 @@ def _parse_sitemap(
             sitemap_partition_urls.append(loc)
 
         for sitemap_partition_url in sitemap_partition_urls:
-            sitemap_partition = ET.fromstring(self._load_url(sitemap_partition_url))
+            sitemap_partition = fromstring(self._load_url(sitemap_partition_url))
 
             # Find all <url /> and iterate through them
             for url in sitemap_partition.findall(f"{{{XML_SITEMAP_SCHEMA}}}url"):

llama-index-integrations/readers/llama-index-readers-stripe-docs/pyproject.toml+2 −1 modified

@@ -29,14 +29,15 @@ license = "GPL-3.0-or-later"
 maintainers = ["amorriscode"]
 name = "llama-index-readers-stripe-docs"
 readme = "README.md"
-version = "0.3.0"
+version = "0.3.1"
 
 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
 html2text = "^2024.2.26"
 urllib3 = "^2.1.0"
 llama-index-readers-web = "^0.3.0"
 llama-index-core = "^0.12.0"
+defusedxml = "^0.7.1"
 
 [tool.poetry.group.dev.dependencies]
 ipython = "8.10.0"

llama-index-integrations/readers/llama-index-readers-web/llama_index/readers/web/sitemap/base.py+4 −3 modified

@@ -1,14 +1,15 @@
 import urllib.request
-import xml.etree.ElementTree as ET
 from typing import List
 
+from defusedxml.ElementTree import fromstring
 from llama_index.core.readers.base import BaseReader
 from llama_index.core.schema import Document
 from llama_index.readers.web.async_web.base import AsyncWebPageReader
 
 
 class SitemapReader(BaseReader):
-    """Asynchronous sitemap reader for web.
+    """
+    Asynchronous sitemap reader for web.
 
     Reads pages from the web based on their sitemap.xml.
 
@@ -34,7 +35,7 @@ def _load_sitemap(self, sitemap_url: str) -> str:
         return sitemap_url_request.read()
 
     def _parse_sitemap(self, raw_sitemap: str, filter_locs: str = None) -> list:
-        sitemap = ET.fromstring(raw_sitemap)
+        sitemap = fromstring(raw_sitemap)
         sitemap_urls = []
 
         for url in sitemap.findall(f"{{{self.xml_schema_sitemap}}}url"):

llama-index-integrations/readers/llama-index-readers-web/pyproject.toml+2 −1 modified

@@ -47,7 +47,7 @@ license = "GPL-3.0-or-later"
 maintainers = ["HawkClaws", "Hironsan", "NA", "an-bluecat", "bborn", "jasonwcfan", "kravetsmic", "pandazki", "ruze00", "selamanse", "thejessezhang"]
 name = "llama-index-readers-web"
 readme = "README.md"
-version = "0.3.8"
+version = "0.3.9"
 
 [tool.poetry.dependencies]
 python = ">=3.9,<4.0"
@@ -62,6 +62,7 @@ playwright = ">=1.30,<2.0"
 newspaper3k = "^0.2.8"
 spider-client = "^0.0.27"
 llama-index-core = "^0.12.0"
+defusedxml = "^0.7.1"
 
 [tool.poetry.group.dev.dependencies]
 ipython = "8.10.0"

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

github.com/advisories/GHSA-w42r-mrx7-c633ghsaADVISORY
nvd.nist.gov/vuln/detail/CVE-2025-3225ghsaADVISORY
github.com/run-llama/llama_index/commit/4f6ee062b19212106a2632af9c9521fc7f0a3584ghsaWEB
huntr.com/bounties/e33c0699-e9a2-49aa-837b-5363205637a2ghsaWEB

News mentions

No linked articles in our index yet.

cvss	0.455
epss	0.000
exploit	0.000
kev	0.000
patch	-0.070
ransomware	0.000