CVE-2023-46229
Description
LangChain before 0.0.317 allows SSRF via document_loaders/recursive_url_loader.py because crawling can proceed from an external server to an internal server.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
LangChain before 0.0.317 contains a server-side request forgery (SSRF) vulnerability in its RecursiveUrlLoader, allowing crawling from external to internal servers.
Vulnerability
Overview
CVE-2023-46229 is a server-side request forgery (SSRF) vulnerability in LangChain versions prior to 0.0.317. The flaw resides in the RecursiveUrlLoader class within document_loaders/recursive_url_loader.py. This loader is designed to recursively crawl all child links from a given URL, but it does not adequately restrict the URLs it can access. As a result, crawling can proceed from an external server to an internal server, enabling SSRF attacks [1][2][3].
Exploitation
An attacker can exploit this vulnerability by providing a malicious URL that, when processed by the RecursiveUrlLoader, causes it to make HTTP requests to internal network addresses. The loader includes a prevent_outside parameter that, by default, restricts crawling to the same domain as the start URL. However, this measure is insufficient: if the start URL is hosted on a server that also hosts other sites (e.g., https://some_host/alice_site/ and https://some_host/bob_site/), a malicious link on Alice's site can cause the crawler to make a GET request to Bob's site on the same host. Additionally, if prevent_outside is disabled, the crawler can be directed to any arbitrary URL, including internal IP addresses [3].
Impact
Successful exploitation allows an attacker to perform SSRF, potentially probing internal services, accessing sensitive data, or interacting with internal systems that are not intended to be exposed. This can lead to further compromise of the internal network [2][4].
Mitigation
The vulnerability is patched in LangChain version 0.0.317. Users are strongly advised to upgrade to this version or later. The fix also includes a security note in the RecursiveUrlLoader documentation warning about the SSRF risks and recommending network access controls [3][4].
AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
langchainPyPI | < 0.0.317 | 0.0.317 |
Affected products
2- LangChain/LangChaindescription
Patches
19ecb7240a480Add security note to recursive url loader (#11934)
1 file changed · +32 −2
libs/langchain/langchain/document_loaders/recursive_url_loader.py+32 −2 modified@@ -49,7 +49,36 @@ def _metadata_extractor(raw_html: str, url: str) -> dict: class RecursiveUrlLoader(BaseLoader): - """Load all child links from a URL page.""" + """Load all child links from a URL page. + + **Security Note**: This loader is a crawler that will start crawling + at a given URL and then expand to crawl child links recursively. + + Web crawlers should generally NOT be deployed with network access + to any internal servers. + + Control access to who can submit crawling requests and what network access + the crawler has. + + While crawling, the crawler may encounter malicious URLs that would lead to a + server-side request forgery (SSRF) attack. + + To mitigate risks, the crawler by default will only load URLs from the same + domain as the start URL (controlled via prevent_outside named argument). + + This will mitigate the risk of SSRF attacks, but will not eliminate it. + + For example, if crawling a host which hosts several sites: + + https://some_host/alice_site/ + https://some_host/bob_site/ + + A malicious URL on Alice's site could cause the crawler to make a malicious + GET request to an endpoint on Bob's site. Both sites are hosted on the + same host, so such a request would not be prevented by default. + + See https://python.langchain.com/docs/security + """ def __init__( self, @@ -60,12 +89,13 @@ def __init__( metadata_extractor: Optional[Callable[[str, str], str]] = None, exclude_dirs: Optional[Sequence[str]] = (), timeout: Optional[int] = 10, - prevent_outside: Optional[bool] = True, + prevent_outside: bool = True, link_regex: Union[str, re.Pattern, None] = None, headers: Optional[dict] = None, check_response_status: bool = False, ) -> None: """Initialize with URL to crawl and any subdirectories to exclude. + Args: url: The URL to crawl. max_depth: The max depth of the recursive loading.
Vulnerability mechanics
Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
5- github.com/advisories/GHSA-655w-fm8m-m478ghsaADVISORY
- nvd.nist.gov/vuln/detail/CVE-2023-46229ghsaADVISORY
- github.com/langchain-ai/langchain/commit/9ecb7240a480720ec9d739b3877a52f76098a2b8ghsaWEB
- github.com/langchain-ai/langchain/pull/11925ghsaWEB
- github.com/pypa/advisory-database/tree/main/vulns/langchain/PYSEC-2023-205.yamlghsaWEB
News mentions
0No linked articles in our index yet.