VYPR
High severityNVD Advisory· Published Mar 20, 2025· Updated Oct 15, 2025

Denial of Service (DoS) in run-llama/llama_index

CVE-2024-12704

Description

A vulnerability in the LangChainLLM class of the run-llama/llama_index repository, version v0.12.5, allows for a Denial of Service (DoS) attack. The stream_complete method executes the llm using a thread and retrieves the result via the get_response_gen method of the StreamingGeneratorCallbackHandler class. If the thread terminates abnormally before the _llm.predict is executed, there is no exception handling for this case, leading to an infinite loop in the get_response_gen function. This can be triggered by providing an input of an incorrect type, causing the thread to terminate and the process to continue running indefinitely.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

An unhandled thread termination in llama_index v0.12.5 leads to an infinite loop causing Denial of Service via malformed input.

Vulnerability

Overview

In version v0.12.5 of the run-llama/llama_index repository, the LangChainLLM class contains a flaw in its stream_complete method that can lead to a Denial of Service (DoS) [1]. The method launches the LLM call in a separate thread and relies on get_response_gen from StreamingGeneratorCallbackHandler to retrieve the response. When the thread terminates abnormally—before the underlying _llm.predict executes—the get_response_gen function enters an infinite loop because it never receives the expected completion signal and lacks exception handling for this edge case [1].

Attack

Vector

An attacker can trigger this condition by providing an input of an incorrect type to the stream_complete method. The malformed input causes the worker thread to fail prematurely, leaving the main process stuck in an indefinite wait cycle [1]. No authentication is required beyond normal access to the vulnerable API endpoint, making it exploitable in any deployment that accepts user-supplied inputs for LLM streaming.

Impact

A successful exploit results in a Denial of Service (DoS) where the affected process hangs indefinitely, consuming system resources and rendering the service unresponsive. This can impact availability for all users relying on the llama_index service [1].

Mitigation

The vulnerability has been addressed in a subsequent commit that introduces a configurable timeout to the get_response_gen method, along with proper error handling to break out of the loop after a default wait of 120 seconds [3]. Users are strongly advised to update to a version containing this fix. As of the publication date, CVE-2024-12704 is not listed in CISA's Known Exploited Vulnerabilities catalog.

AI Insight generated on May 20, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
llama-index-corePyPI
< 0.12.60.12.6

Affected products

3

Patches

1
d1ecfb77578d

fix: add a timeout to langchain callback handler (#17296)

https://github.com/run-llama/llama_indexMassimiliano PippiDec 17, 2024via ghsa
1 file changed · +17 1
  • llama-index-core/llama_index/core/langchain_helpers/streaming.py+17 1 modified
    @@ -1,3 +1,4 @@
    +import time
     from queue import Queue
     from threading import Event
     from typing import Any, Generator, List, Optional
    @@ -35,10 +36,25 @@ def on_llm_error(
         ) -> None:
             self._done.set()
     
    -    def get_response_gen(self) -> Generator:
    +    def get_response_gen(self, timeout: float = 120.0) -> Generator:
    +        """Get response generator with timeout.
    +
    +        Args:
    +            timeout (float): Maximum time in seconds to wait for the complete response.
    +                            Defaults to 120 seconds.
    +        """
    +        start_time = time.time()
             while True:
    +            if time.time() - start_time > timeout:
    +                raise TimeoutError(
    +                    f"Response generation timed out after {timeout} seconds"
    +                )
    +
                 if not self._token_queue.empty():
                     token = self._token_queue.get_nowait()
                     yield token
                 elif self._done.is_set():
                     break
    +            else:
    +                # Small sleep to prevent CPU spinning
    +                time.sleep(0.01)
    

Vulnerability mechanics

Generated on May 9, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

4

News mentions

0

No linked articles in our index yet.