VYPR
Moderate severityNVD Advisory· Published Jun 6, 2024· Updated Oct 15, 2025

Denial-of-Service in LangChain SitemapLoader in langchain-ai/langchain

CVE-2024-2965

Description

A Denial-of-Service (DoS) vulnerability exists in the SitemapLoader class of the langchain-ai/langchain repository, affecting all versions. The parse_sitemap method, responsible for parsing sitemaps and extracting URLs, lacks a mechanism to prevent infinite recursion when a sitemap URL refers to the current sitemap itself. This oversight allows for the possibility of an infinite loop, leading to a crash by exceeding the maximum recursion depth in Python. This vulnerability can be exploited to occupy server socket/port resources and crash the Python process, impacting the availability of services relying on this functionality.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

A denial-of-service vulnerability in langchain's SitemapLoader allows infinite recursion when a sitemap references itself, crashing the Python process.

Vulnerability

Overview

CVE-2024-2965 is a Denial-of-Service (DoS) vulnerability in the SitemapLoader class of the langchain-ai/langchain repository, affecting all versions. The parse_sitemap method lacks a mechanism to prevent infinite recursion when a sitemap URL refers to the current sitemap itself, leading to a crash by exceeding the maximum recursion depth in Python [1][2].

Attack

Vector and Prerequisites

The vulnerability can be triggered by providing or ingesting a malicious sitemap that contains a reference to its own URL, causing an infinite loop during parsing. No authentication is required, and an attacker can cause resource exhaustion by occupying server socket/port resources and crashing the Python process, impacting service availability [2].

Impact

Successful exploitation results in a denial of service, where the LangChain application becomes unresponsive or terminates entirely. This can affect any service relying on the SitemapLoader functionality, particularly those that automatically fetch and parse sitemaps from untrusted sources [2][3].

Mitigation

Status

The vulnerability has been patched in pull request #22903, which introduced a max_depth parameter (default 10) to limit recursion depth [1][4]. Users should update to a version containing this fix or apply the patch manually. There are no known workarounds, so upgrading is strongly recommended [2].

AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
langchain-communityPyPI
< 0.2.50.2.5
langchainPyPI
< 0.2.50.2.5

Affected products

3

Patches

2
73c42306745b

community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)

https://github.com/langchain-ai/langchainEugene YurtsevJun 14, 2024via ghsa
1 file changed · +23 5
  • libs/community/langchain_community/document_loaders/sitemap.py+23 5 modified
    @@ -1,6 +1,16 @@
     import itertools
     import re
    -from typing import Any, Callable, Generator, Iterable, Iterator, List, Optional, Tuple
    +from typing import (
    +    Any,
    +    Callable,
    +    Dict,
    +    Generator,
    +    Iterable,
    +    Iterator,
    +    List,
    +    Optional,
    +    Tuple,
    +)
     from urllib.parse import urlparse
     
     from langchain_core.documents import Document
    @@ -75,6 +85,7 @@ def __init__(
             is_local: bool = False,
             continue_on_failure: bool = False,
             restrict_to_same_domain: bool = True,
    +        max_depth: int = 10,
             **kwargs: Any,
         ):
             """Initialize with webpage path and optional filter URLs.
    @@ -105,6 +116,7 @@ def __init__(
                 restrict_to_same_domain: whether to restrict loading to URLs to the same
                     domain as the sitemap. Attention: This is only applied if the sitemap
                     is not a local file!
    +            max_depth: maximum depth to follow sitemap links. Default: 10
             """
     
             if blocksize is not None and blocksize < 1:
    @@ -134,17 +146,23 @@ def __init__(
             self.blocknum = blocknum
             self.is_local = is_local
             self.continue_on_failure = continue_on_failure
    +        self.max_depth = max_depth
     
    -    def parse_sitemap(self, soup: Any) -> List[dict]:
    +    def parse_sitemap(self, soup: Any, *, depth: int = 0) -> List[dict]:
             """Parse sitemap xml and load into a list of dicts.
     
             Args:
                 soup: BeautifulSoup object.
    +            depth: current depth of the sitemap. Default: 0
     
             Returns:
                 List of dicts.
             """
    -        els = []
    +        if depth >= self.max_depth:
    +            return []
    +
    +        els: List[Dict] = []
    +
             for url in soup.find_all("url"):
                 loc = url.find("loc")
                 if not loc:
    @@ -177,9 +195,9 @@ def parse_sitemap(self, soup: Any) -> List[dict]:
                 loc = sitemap.find("loc")
                 if not loc:
                     continue
    -            soup_child = self.scrape_all([loc.text], "xml")[0]
     
    -            els.extend(self.parse_sitemap(soup_child))
    +            soup_child = self.scrape_all([loc.text], "xml")[0]
    +            els.extend(self.parse_sitemap(soup_child, depth=depth + 1))
             return els
     
         def lazy_load(self) -> Iterator[Document]:
    
9a877c7adbd0

community[patch]: SitemapLoader restrict depth of parsing sitemap (CVE-2024-2965) (#22903)

https://github.com/langchain-ai/langchainEugene YurtsevJun 14, 2024via ghsa
1 file changed · +23 5
  • libs/community/langchain_community/document_loaders/sitemap.py+23 5 modified
    @@ -1,6 +1,16 @@
     import itertools
     import re
    -from typing import Any, Callable, Generator, Iterable, Iterator, List, Optional, Tuple
    +from typing import (
    +    Any,
    +    Callable,
    +    Dict,
    +    Generator,
    +    Iterable,
    +    Iterator,
    +    List,
    +    Optional,
    +    Tuple,
    +)
     from urllib.parse import urlparse
     
     from langchain_core.documents import Document
    @@ -75,6 +85,7 @@ def __init__(
             is_local: bool = False,
             continue_on_failure: bool = False,
             restrict_to_same_domain: bool = True,
    +        max_depth: int = 10,
             **kwargs: Any,
         ):
             """Initialize with webpage path and optional filter URLs.
    @@ -105,6 +116,7 @@ def __init__(
                 restrict_to_same_domain: whether to restrict loading to URLs to the same
                     domain as the sitemap. Attention: This is only applied if the sitemap
                     is not a local file!
    +            max_depth: maximum depth to follow sitemap links. Default: 10
             """
     
             if blocksize is not None and blocksize < 1:
    @@ -134,17 +146,23 @@ def __init__(
             self.blocknum = blocknum
             self.is_local = is_local
             self.continue_on_failure = continue_on_failure
    +        self.max_depth = max_depth
     
    -    def parse_sitemap(self, soup: Any) -> List[dict]:
    +    def parse_sitemap(self, soup: Any, *, depth: int = 0) -> List[dict]:
             """Parse sitemap xml and load into a list of dicts.
     
             Args:
                 soup: BeautifulSoup object.
    +            depth: current depth of the sitemap. Default: 0
     
             Returns:
                 List of dicts.
             """
    -        els = []
    +        if depth >= self.max_depth:
    +            return []
    +
    +        els: List[Dict] = []
    +
             for url in soup.find_all("url"):
                 loc = url.find("loc")
                 if not loc:
    @@ -177,9 +195,9 @@ def parse_sitemap(self, soup: Any) -> List[dict]:
                 loc = sitemap.find("loc")
                 if not loc:
                     continue
    -            soup_child = self.scrape_all([loc.text], "xml")[0]
     
    -            els.extend(self.parse_sitemap(soup_child))
    +            soup_child = self.scrape_all([loc.text], "xml")[0]
    +            els.extend(self.parse_sitemap(soup_child, depth=depth + 1))
             return els
     
         def lazy_load(self) -> Iterator[Document]:
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

7

News mentions

0

No linked articles in our index yet.