Denial-of-Service in LangChain SitemapLoader in langchain-ai/langchain
Description
A Denial-of-Service (DoS) vulnerability exists in the SitemapLoader class of the langchain-ai/langchain repository, affecting all versions. The parse_sitemap method, responsible for parsing sitemaps and extracting URLs, lacks a mechanism to prevent infinite recursion when a sitemap URL refers to the current sitemap itself. This oversight allows for the possibility of an infinite loop, leading to a crash by exceeding the maximum recursion depth in Python. This vulnerability can be exploited to occupy server socket/port resources and crash the Python process, impacting the availability of services relying on this functionality.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
A denial-of-service vulnerability in langchain's SitemapLoader allows infinite recursion when a sitemap references itself, crashing the Python process.
Vulnerability
Overview
CVE-2024-2965 is a Denial-of-Service (DoS) vulnerability in the SitemapLoader class of the langchain-ai/langchain repository, affecting all versions. The parse_sitemap method lacks a mechanism to prevent infinite recursion when a sitemap URL refers to the current sitemap itself, leading to a crash by exceeding the maximum recursion depth in Python [1][2].
Attack
Vector and Prerequisites
The vulnerability can be triggered by providing or ingesting a malicious sitemap that contains a reference to its own URL, causing an infinite loop during parsing. No authentication is required, and an attacker can cause resource exhaustion by occupying server socket/port resources and crashing the Python process, impacting service availability [2].
Impact
Successful exploitation results in a denial of service, where the LangChain application becomes unresponsive or terminates entirely. This can affect any service relying on the SitemapLoader functionality, particularly those that automatically fetch and parse sitemaps from untrusted sources [2][3].
Mitigation
Status
The vulnerability has been patched in pull request #22903, which introduced a max_depth parameter (default 10) to limit recursion depth [1][4]. Users should update to a version containing this fix or apply the patch manually. There are no known workarounds, so upgrading is strongly recommended [2].
- community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) by eyurtsev · Pull Request #22903 · langchain-ai/langchain
- NVD - CVE-2024-2965
- GitHub - langchain-ai/langchain: The agent engineering platform.
- community[patch]: SitemapLoader restrict depth of parsing sitemap (CV… · langchain-ai/langchain@9a877c7
AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
langchain-communityPyPI | < 0.2.5 | 0.2.5 |
langchainPyPI | < 0.2.5 | 0.2.5 |
Affected products
3- ghsa-coords2 versions
< 0.2.5+ 1 more
- (no CPE)range: < 0.2.5
- (no CPE)range: < 0.2.5
- langchain-ai/langchain-ai/langchainv5Range: unspecified
Patches
273c42306745bcommunity[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)
1 file changed · +23 −5
libs/community/langchain_community/document_loaders/sitemap.py+23 −5 modified@@ -1,6 +1,16 @@ import itertools import re -from typing import Any, Callable, Generator, Iterable, Iterator, List, Optional, Tuple +from typing import ( + Any, + Callable, + Dict, + Generator, + Iterable, + Iterator, + List, + Optional, + Tuple, +) from urllib.parse import urlparse from langchain_core.documents import Document @@ -75,6 +85,7 @@ def __init__( is_local: bool = False, continue_on_failure: bool = False, restrict_to_same_domain: bool = True, + max_depth: int = 10, **kwargs: Any, ): """Initialize with webpage path and optional filter URLs. @@ -105,6 +116,7 @@ def __init__( restrict_to_same_domain: whether to restrict loading to URLs to the same domain as the sitemap. Attention: This is only applied if the sitemap is not a local file! + max_depth: maximum depth to follow sitemap links. Default: 10 """ if blocksize is not None and blocksize < 1: @@ -134,17 +146,23 @@ def __init__( self.blocknum = blocknum self.is_local = is_local self.continue_on_failure = continue_on_failure + self.max_depth = max_depth - def parse_sitemap(self, soup: Any) -> List[dict]: + def parse_sitemap(self, soup: Any, *, depth: int = 0) -> List[dict]: """Parse sitemap xml and load into a list of dicts. Args: soup: BeautifulSoup object. + depth: current depth of the sitemap. Default: 0 Returns: List of dicts. """ - els = [] + if depth >= self.max_depth: + return [] + + els: List[Dict] = [] + for url in soup.find_all("url"): loc = url.find("loc") if not loc: @@ -177,9 +195,9 @@ def parse_sitemap(self, soup: Any) -> List[dict]: loc = sitemap.find("loc") if not loc: continue - soup_child = self.scrape_all([loc.text], "xml")[0] - els.extend(self.parse_sitemap(soup_child)) + soup_child = self.scrape_all([loc.text], "xml")[0] + els.extend(self.parse_sitemap(soup_child, depth=depth + 1)) return els def lazy_load(self) -> Iterator[Document]:
9a877c7adbd0community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)
1 file changed · +23 −5
libs/community/langchain_community/document_loaders/sitemap.py+23 −5 modified@@ -1,6 +1,16 @@ import itertools import re -from typing import Any, Callable, Generator, Iterable, Iterator, List, Optional, Tuple +from typing import ( + Any, + Callable, + Dict, + Generator, + Iterable, + Iterator, + List, + Optional, + Tuple, +) from urllib.parse import urlparse from langchain_core.documents import Document @@ -75,6 +85,7 @@ def __init__( is_local: bool = False, continue_on_failure: bool = False, restrict_to_same_domain: bool = True, + max_depth: int = 10, **kwargs: Any, ): """Initialize with webpage path and optional filter URLs. @@ -105,6 +116,7 @@ def __init__( restrict_to_same_domain: whether to restrict loading to URLs to the same domain as the sitemap. Attention: This is only applied if the sitemap is not a local file! + max_depth: maximum depth to follow sitemap links. Default: 10 """ if blocksize is not None and blocksize < 1: @@ -134,17 +146,23 @@ def __init__( self.blocknum = blocknum self.is_local = is_local self.continue_on_failure = continue_on_failure + self.max_depth = max_depth - def parse_sitemap(self, soup: Any) -> List[dict]: + def parse_sitemap(self, soup: Any, *, depth: int = 0) -> List[dict]: """Parse sitemap xml and load into a list of dicts. Args: soup: BeautifulSoup object. + depth: current depth of the sitemap. Default: 0 Returns: List of dicts. """ - els = [] + if depth >= self.max_depth: + return [] + + els: List[Dict] = [] + for url in soup.find_all("url"): loc = url.find("loc") if not loc: @@ -177,9 +195,9 @@ def parse_sitemap(self, soup: Any) -> List[dict]: loc = sitemap.find("loc") if not loc: continue - soup_child = self.scrape_all([loc.text], "xml")[0] - els.extend(self.parse_sitemap(soup_child)) + soup_child = self.scrape_all([loc.text], "xml")[0] + els.extend(self.parse_sitemap(soup_child, depth=depth + 1)) return els def lazy_load(self) -> Iterator[Document]:
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
7- github.com/advisories/GHSA-3hjh-jh2h-vrg6ghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2024-2965ghsaADVISORY
- github.com/langchain-ai/langchain/commit/73c42306745b0831aa6fe7fe4eeb70d2c2d87a82ghsaWEB
- github.com/langchain-ai/langchain/commit/9a877c7adbd06f90a2518152f65b562bd90487ccghsaWEB
- github.com/langchain-ai/langchain/pull/22903ghsaWEB
- github.com/pypa/advisory-database/tree/main/vulns/langchain/PYSEC-2024-118.yamlghsaWEB
- huntr.com/bounties/90b0776d-9fa6-4841-aac4-09fde5918caeghsaWEB
News mentions
0No linked articles in our index yet.