PyPI package

vllm

pkg:pypi/vllm

Vulnerabilities (40)

CVE	Sev	CVSS	Affected versions	Fixed in	Published	Description
CVE-2025-48956		—	>= 0.1.0, < 0.10.1.1	0.10.1.1	Aug 21, 2025	vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory
CVE-2025-48944		—	>= 0.8.0, < 0.9.0	0.9.0	May 30, 2025	vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the t
CVE-2025-48943		—	>= 0.8.0, < 0.9.0	0.9.0	May 30, 2025	vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to
CVE-2025-48942		—	>= 0.8.0, < 0.9.0	0.9.0	May 30, 2025	vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-4
CVE-2025-48887		—	>= 0.6.4, < 0.9.0	0.9.0	May 30, 2025	vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the u
CVE-2025-46722		—	>= 0.7.0, < 0.9.0	0.9.0	May 29, 2025	vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializ
CVE-2025-46570		—	< 0.9.0	0.9.0	May 29, 2025	vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). The
CVE-2025-47277		—	>= 0.6.5, < 0.8.5	0.8.5	May 20, 2025	vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use
CVE-2025-30165		—	>= 0.5.2, < 0.10.0	0.10.0	May 6, 2025	vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary
CVE-2025-32444		—	>= 0.6.5, < 0.8.5	0.8.5	Apr 30, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sock
CVE-2025-46560		—	>= 0.8.0, < 0.8.5	0.8.5	Apr 30, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces p
CVE-2025-30202		—	>= 0.5.2, < 0.8.5	0.8.5	Apr 30, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ
CVE-2024-11041		—	<= 0.6.2	—	Mar 20, 2025	vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload t
CVE-2024-9053		—	<= 0.6.0	—	Mar 20, 2025	vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization.
CVE-2025-29783		—	>= 0.6.5, < 0.8.0	0.8.0	Mar 19, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is
CVE-2025-29770		—	< 0.8.0	0.8.0	Mar 19, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesys
CVE-2025-25183		—	< 0.7.2	0.7.2	Feb 7, 2025	vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of
CVE-2025-24357		—	< 0.7.0	0.7.0	Jan 27, 2025	vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When tor
CVE-2024-8939	Med	6.2	<= 0.5.0.post1	—	Sep 17, 2024	A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best comp
CVE-2024-8768	Hig	7.5	< 0.5.5	0.5.5	Sep 17, 2024	A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.

CVE-2025-48956Aug 21, 2025
affected >= 0.1.0, < 0.10.1.1fixed 0.10.1.1
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server memory
CVE-2025-48944May 30, 2025
affected >= 0.8.0, < 0.9.0fixed 0.9.0
vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the t
CVE-2025-48943May 30, 2025
affected >= 0.8.0, < 0.9.0fixed 0.9.0
vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar to
CVE-2025-48942May 30, 2025
affected >= 0.8.0, < 0.9.0fixed 0.9.0
vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar GHSA-9hcf-v7m4-6m2j/CVE-2025-4
CVE-2025-48887May 30, 2025
affected >= 0.6.4, < 0.9.0fixed 0.9.0
vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the u
CVE-2025-46722May 29, 2025
affected >= 0.7.0, < 0.9.0fixed 0.9.0
vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it serializ
CVE-2025-46570May 29, 2025
affected < 0.9.0fixed 0.9.0
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). The
CVE-2025-47277May 20, 2025
affected >= 0.6.5, < 0.8.5fixed 0.8.5
vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the use
CVE-2025-30165May 6, 2025
affected >= 0.5.2, < 0.10.0fixed 0.10.0
vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary
CVE-2025-32444Apr 30, 2025
affected >= 0.6.5, < 0.8.5fixed 0.8.5
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ sock
CVE-2025-46560Apr 30, 2025
affected >= 0.8.0, < 0.8.5fixed 0.8.5
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces p
CVE-2025-30202Apr 30, 2025
affected >= 0.5.2, < 0.8.5fixed 0.8.5
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ
CVE-2024-11041Mar 20, 2025
affected <= 0.6.2
vllm-project vllm version v0.6.2 contains a vulnerability in the MessageQueue.dequeue() API function. The function uses pickle.loads to parse received sockets directly, leading to a remote code execution vulnerability. An attacker can exploit this by sending a malicious payload t
CVE-2024-9053Mar 20, 2025
affected <= 0.6.0
vllm-project vllm version 0.6.0 contains a vulnerability in the AsyncEngineRPCServer() RPC server entrypoints. The core functionality run_server_loop() calls the function _make_handler_coro(), which directly uses cloudpickle.loads() on received messages without any sanitization.
CVE-2025-29783Mar 19, 2025
affected >= 0.6.5, < 0.8.0fixed 0.8.0
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is
CVE-2025-29770Mar 19, 2025
affected < 0.8.0fixed 0.8.0
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. The outlines library is one of the backends used by vLLM to support structured output (a.k.a. guided decoding). Outlines provides an optional cache for its compiled grammars on the local filesys
CVE-2025-25183Feb 7, 2025
affected < 0.7.2fixed 0.7.2
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Maliciously constructed statements can lead to hash collisions, resulting in cache reuse, which can interfere with subsequent responses and cause unintended behavior. Prefix caching makes use of
CVE-2025-24357Jan 27, 2025
affected < 0.7.0fixed 0.7.0
vLLM is a library for LLM inference and serving. vllm/model_executor/weight_utils.py implements hf_model_weights_iterator to load the model checkpoint, which is downloaded from huggingface. It uses the torch.load function and the weights_only parameter defaults to False. When tor
CVE-2024-8939MedSep 17, 2024
affected <= 0.5.0.post1
A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best comp
CVE-2024-8768HigSep 17, 2024
affected < 0.5.5fixed 0.5.5
A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.

Page 2 of 2