VYPR
High severityNVD Advisory· Published Oct 19, 2023· Updated Sep 12, 2024

CVE-2023-46229

CVE-2023-46229

Description

LangChain before 0.0.317 allows SSRF via document_loaders/recursive_url_loader.py because crawling can proceed from an external server to an internal server.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

LangChain before 0.0.317 contains a server-side request forgery (SSRF) vulnerability in its RecursiveUrlLoader, allowing crawling from external to internal servers.

Vulnerability

Overview

CVE-2023-46229 is a server-side request forgery (SSRF) vulnerability in LangChain versions prior to 0.0.317. The flaw resides in the RecursiveUrlLoader class within document_loaders/recursive_url_loader.py. This loader is designed to recursively crawl all child links from a given URL, but it does not adequately restrict the URLs it can access. As a result, crawling can proceed from an external server to an internal server, enabling SSRF attacks [1][2][3].

Exploitation

An attacker can exploit this vulnerability by providing a malicious URL that, when processed by the RecursiveUrlLoader, causes it to make HTTP requests to internal network addresses. The loader includes a prevent_outside parameter that, by default, restricts crawling to the same domain as the start URL. However, this measure is insufficient: if the start URL is hosted on a server that also hosts other sites (e.g., https://some_host/alice_site/ and https://some_host/bob_site/), a malicious link on Alice's site can cause the crawler to make a GET request to Bob's site on the same host. Additionally, if prevent_outside is disabled, the crawler can be directed to any arbitrary URL, including internal IP addresses [3].

Impact

Successful exploitation allows an attacker to perform SSRF, potentially probing internal services, accessing sensitive data, or interacting with internal systems that are not intended to be exposed. This can lead to further compromise of the internal network [2][4].

Mitigation

The vulnerability is patched in LangChain version 0.0.317. Users are strongly advised to upgrade to this version or later. The fix also includes a security note in the RecursiveUrlLoader documentation warning about the SSRF risks and recommending network access controls [3][4].

AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
langchainPyPI
< 0.0.3170.0.317

Affected products

2

Patches

1
9ecb7240a480

Add security note to recursive url loader (#11934)

https://github.com/langchain-ai/langchainEugene YurtsevOct 17, 2023via ghsa
1 file changed · +32 2
  • libs/langchain/langchain/document_loaders/recursive_url_loader.py+32 2 modified
    @@ -49,7 +49,36 @@ def _metadata_extractor(raw_html: str, url: str) -> dict:
     
     
     class RecursiveUrlLoader(BaseLoader):
    -    """Load all child links from a URL page."""
    +    """Load all child links from a URL page.
    +
    +    **Security Note**: This loader is a crawler that will start crawling
    +        at a given URL and then expand to crawl child links recursively.
    +
    +        Web crawlers should generally NOT be deployed with network access
    +        to any internal servers.
    +
    +        Control access to who can submit crawling requests and what network access
    +        the crawler has.
    +
    +        While crawling, the crawler may encounter malicious URLs that would lead to a
    +        server-side request forgery (SSRF) attack.
    +
    +        To mitigate risks, the crawler by default will only load URLs from the same
    +        domain as the start URL (controlled via prevent_outside named argument).
    +
    +        This will mitigate the risk of SSRF attacks, but will not eliminate it.
    +
    +        For example, if crawling a host which hosts several sites:
    +
    +        https://some_host/alice_site/
    +        https://some_host/bob_site/
    +
    +        A malicious URL on Alice's site could cause the crawler to make a malicious
    +        GET request to an endpoint on Bob's site. Both sites are hosted on the
    +        same host, so such a request would not be prevented by default.
    +
    +        See https://python.langchain.com/docs/security
    +    """
     
         def __init__(
             self,
    @@ -60,12 +89,13 @@ def __init__(
             metadata_extractor: Optional[Callable[[str, str], str]] = None,
             exclude_dirs: Optional[Sequence[str]] = (),
             timeout: Optional[int] = 10,
    -        prevent_outside: Optional[bool] = True,
    +        prevent_outside: bool = True,
             link_regex: Union[str, re.Pattern, None] = None,
             headers: Optional[dict] = None,
             check_response_status: bool = False,
         ) -> None:
             """Initialize with URL to crawl and any subdirectories to exclude.
    +
             Args:
                 url: The URL to crawl.
                 max_depth: The max depth of the recursive loading.
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.