Vendor CVEs
Vllm
All CVEs
54 total · sorted by risk| CVE | Vendor / Product | Sev | Risk | CVSS | EPSS | KEV | Published | Description |
|---|---|---|---|---|---|---|---|---|
| CVE-2026-4944 | Hig | 0.57 | 8.8 | 0.01 | May 28, 2026 | vllm-project/vllm version 0.14.1 contains a vulnerability where the `trust_remote_code=True` parameter is hardcoded in two model implementation files (`vllm/model_executor/models/nemotron_vl.py` and `vllm/model_executor/models/kimi_k25.py`). This bypasses the user's explicit… | ||
| CVE-2026-48746 | cri | 0.52 | — | 0.01 | Jun 16, 2026 | ### Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API `AuthenticationMiddleware`, which was discovered during @x41sec's source code audit. It allows to use the API without providing the… | ||
| CVE-2026-5497 | Hig | 0.42 | 7.5 | 0.01 | Jun 11, 2026 | vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory (OOM) Denial of Service (DoS) attack due to unbounded frame count processing in the `VideoMediaIO.load_base64()` method. When processing `video/jpeg` data URLs, the method splits the base64 data string on commas to… | ||
| CVE-2024-8768 | Hig | 0.42 | 7.5 | 0.01 | Sep 17, 2024 | A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service. | ||
| CVE-2024-8939 | Med | 0.40 | 6.2 | 0.00 | Sep 17, 2024 | A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best… | ||
| CVE-2026-41523 | hig | 0.39 | — | 0.00 | Jun 16, 2026 | ### Summary An `assert`-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode (`python -O` or… | ||
| CVE-2025-6242 | Hig | 0.39 | 7.1 | 0.00 | Oct 7, 2025 | A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target… | ||
| CVE-2025-9141 | hig | 0.39 | — | 0.04 | Aug 21, 2025 | ### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool… | ||
| CVE-2026-44223 | Med | 0.35 | 6.5 | 0.00 | May 12, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the… | ||
| CVE-2026-44222 | Med | 0.35 | 6.5 | 0.00 | May 12, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and… | ||
| CVE-2026-34756 | Med | 0.35 | 6.5 | 0.00 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest… | ||
| CVE-2026-34755 | Med | 0.35 | 6.5 | 0.00 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame… | ||
| CVE-2026-9540 | Med | 0.34 | 5.3 | 0.00 | May 26, 2026 | A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available… | ||
| CVE-2026-34760 | Med | 0.31 | 5.9 | 0.00 | Apr 2, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm.… | ||
| CVE-2026-7141 | Med | 0.29 | 5.6 | 0.00 | Apr 27, 2026 | A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack… | ||
| CVE-2026-34753 | Med | 0.28 | 5.4 | 0.00 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary… | ||
| CVE-2026-12491 | mod | 0.24 | 4.8 | 0.00 | Jun 10, 2026 | vllm: vllm: image EXIF Rotation & PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations | ||
| CVE-2025-61620 | med | 0.19 | — | 0.00 | Oct 7, 2025 | ### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these… | ||
| CVE-2025-1953 | Low | 0.17 | 2.6 | 0.00 | Mar 4, 2025 | A vulnerability has been found in vLLM AIBrix 0.2.0 and classified as problematic. Affected by this vulnerability is an unknown functionality of the file pkg/plugins/gateway/prefixcacheindexer/hash.go of the component Prefix Caching. The manipulation leads to insufficiently… | ||
| CVE-2026-54232 | 0.00 | — | 0.00 | Jun 22, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.1, the vLLM Dockerfile is vulnerable to a dependency confusion attack through the flashinfer-jit-cache package. The package is installed from a custom index (flashinfer.ai/whl/) using… | |||
| CVE-2026-56340 | 0.00 | — | 0.00 | Jun 20, 2026 | vLLM versions >= 0.10.2 and < 0.13.0 are missing sparse tensor validation in multimodal embeddings processing. Because PyTorch disables sparse tensor invariant checks by default, an attacker can submit crafted embedding requests with malformed (negative or out-of-bounds) tensor… | |||
| CVE-2025-71379 | 0.00 | — | 0.00 | Jun 20, 2026 | vLLM versions >= 0.6.3 and < 0.9.0 contain multiple regular expression denial of service (ReDoS) vulnerabilities. Several regex patterns — in vllm/lora/utils.py, the phi4mini tool parser, and the OpenAI-compatible serving chat endpoint — are susceptible to catastrophic… | |||
| CVE-2026-54233 | 0.00 | — | 0.00 | Jun 17, 2026 | ### Summary vLLM's `/v1/audio/transcriptions` endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. ### Details `SpeechToTextProcessor` rejects uploads over… | |||
| CVE-2026-54236 | 0.00 | — | 0.01 | Jun 17, 2026 | # vLLM: incomplete CVE-2026-22778 fix leaks PIL repr addresses via the Anthropic API router **Researcher:** Kai Aizen — SnailSploit (@SnailSploit), Adversarial & Offensive Security Research **Severity:** CVSS 3.1 5.3 (Medium) `AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N` **Target:**… | |||
| CVE-2026-53923 | 0.00 | — | 0.00 | Jun 17, 2026 | ## Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (`csrc/quantization/gguf/gguf_kernel.cu`) causes partial tensor processing. The output tensor is allocated at full size via `torch::empty` (uninitialized memory), but the dequantize CUDA kernel… | |||
| CVE-2026-54235 | 0.00 | — | 0.00 | Jun 17, 2026 | ## Summary All temperature validation gates use comparison operators (`<`, `>`), which silently evaluate to `False` for `NaN` and for positive `Infinity` in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce… | |||
| CVE-2026-47155 | 0.00 | — | 0.00 | Jun 10, 2026 | ### Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies `--revision` or `--code-revision` can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository… | |||
| CVE-2026-27893 | 0.00 | — | 0.01 | Mar 26, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit… | |||
| CVE-2026-25960 | 0.00 | — | 0.00 | Mar 9, 2026 | vLLM is an inference and serving engine for large language models (LLMs). The SSRF protection fix for CVE-2026-24779 add in 0.15.1 can be bypassed in the load_from_url_async method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client.… | |||
| CVE-2026-22778 | 0.00 | — | 0.04 | Feb 2, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR… | |||
| CVE-2026-24779 | 0.00 | — | 0.01 | Jan 27, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async… | |||
| CVE-2026-22807 | 0.00 | — | 0.01 | Jan 21, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python… | |||
| CVE-2026-22773 | 0.00 | — | 0.00 | Jan 10, 2026 | vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This… | |||
| CVE-2025-66448 | 0.00 | — | 0.01 | Dec 1, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves… | |||
| CVE-2025-62372 | 0.00 | — | 0.00 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong),… | |||
| CVE-2025-62426 | 0.00 | — | 0.00 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the… | |||
| CVE-2025-62164 | 0.00 | — | 0.01 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When… | |||
| CVE-2025-59425 | 0.00 | — | 0.01 | Oct 7, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more… | |||
| CVE-2025-48956 | 0.00 | — | 0.01 | Aug 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server… | |||
| CVE-2025-48944 | 0.00 | — | 0.00 | May 30, 2025 | vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the… | |||
| CVE-2025-48943 | 0.00 | — | 0.00 | May 30, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar… | |||
| CVE-2025-48942 | 0.00 | — | 0.00 | May 30, 2025 | vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar… | |||
| CVE-2025-48887 | 0.00 | — | 0.00 | May 30, 2025 | vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the… | |||
| CVE-2025-46722 | 0.00 | — | 0.00 | May 29, 2025 | vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it… | |||
| CVE-2025-46570 | 0.00 | — | 0.00 | May 29, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token).… | |||
| CVE-2025-47277 | 0.00 | — | 0.01 | May 20, 2025 | vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the… | |||
| CVE-2025-30165 | 0.00 | — | 0.00 | May 6, 2025 | vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary… | |||
| CVE-2025-32444 | 0.00 | — | 0.01 | Apr 30, 2025 | vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ… | |||
| CVE-2025-46560 | 0.00 | — | 0.00 | Apr 30, 2025 | vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces… | |||
| CVE-2025-30202 | 0.00 | — | 0.01 | Apr 30, 2025 | vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ… |
- risk 0.57cvss 8.8epss 0.01
vllm-project/vllm version 0.14.1 contains a vulnerability where the `trust_remote_code=True` parameter is hardcoded in two model implementation files (`vllm/model_executor/models/nemotron_vl.py` and `vllm/model_executor/models/kimi_k25.py`). This bypasses the user's explicit…
- risk 0.52cvss —epss 0.01
### Summary A vulnerability in ASGI web servers and starlette's trust on those web servers enables an authentication bypass of the OpenAI API `AuthenticationMiddleware`, which was discovered during @x41sec's source code audit. It allows to use the API without providing the…
- risk 0.42cvss 7.5epss 0.01
vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory (OOM) Denial of Service (DoS) attack due to unbounded frame count processing in the `VideoMediaIO.load_base64()` method. When processing `video/jpeg` data URLs, the method splits the base64 data string on commas to…
- risk 0.42cvss 7.5epss 0.01
A flaw was found in the vLLM library. A completions API request with an empty prompt will crash the vLLM API server, resulting in a denial of service.
- risk 0.40cvss 6.2epss 0.00
A vulnerability was found in the ilab model serve component, where improper handling of the best_of parameter in the vllm JSON web API can lead to a Denial of Service (DoS). The API used for LLM-based sentence or chat completion accepts a best_of parameter to return the best…
- risk 0.39cvss —epss 0.00
### Summary An `assert`-based security check in vLLM's activation function loading allows any unauthenticated attacker to achieve arbitrary code execution on the server by publishing a malicious HuggingFace model, when vLLM runs in Python optimized mode (`python -O` or…
- risk 0.39cvss 7.1epss 0.00
A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target…
- risk 0.39cvss —epss 0.04
### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool…
- risk 0.35cvss 6.5epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the…
- risk 0.35cvss 6.5epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and…
- risk 0.35cvss 6.5epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest…
- risk 0.35cvss 6.5epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame…
- risk 0.34cvss 5.3epss 0.00
A vulnerability was identified in vllm-project vllm 0.19.0. This issue affects some unknown processing of the component OpenAI-compatible Serving Path. Such manipulation leads to denial of service. It is possible to launch the attack remotely. The exploit is publicly available…
- risk 0.31cvss 5.9epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before version 0.18.0, Librosa defaults to using numpy.mean for mono downmixing (to_mono), while the international standard ITU-R BS.775-4 specifies a weighted downmixing algorithm.…
- risk 0.29cvss 5.6epss 0.00
A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack…
- risk 0.28cvss 5.4epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary…
- risk 0.24cvss 4.8epss 0.00
vllm: vllm: image EXIF Rotation & PNG tRNS Transparency Not Normalized, Causing Mismatch Between Model Input and Expectations
- risk 0.19cvss —epss 0.00
### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these…
- risk 0.17cvss 2.6epss 0.00
A vulnerability has been found in vLLM AIBrix 0.2.0 and classified as problematic. Affected by this vulnerability is an unknown functionality of the file pkg/plugins/gateway/prefixcacheindexer/hash.go of the component Prefix Caching. The manipulation leads to insufficiently…
- CVE-2026-54232Jun 22, 2026risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.22.1, the vLLM Dockerfile is vulnerable to a dependency confusion attack through the flashinfer-jit-cache package. The package is installed from a custom index (flashinfer.ai/whl/) using…
- CVE-2026-56340Jun 20, 2026risk 0.00cvss —epss 0.00
vLLM versions >= 0.10.2 and < 0.13.0 are missing sparse tensor validation in multimodal embeddings processing. Because PyTorch disables sparse tensor invariant checks by default, an attacker can submit crafted embedding requests with malformed (negative or out-of-bounds) tensor…
- CVE-2025-71379Jun 20, 2026risk 0.00cvss —epss 0.00
vLLM versions >= 0.6.3 and < 0.9.0 contain multiple regular expression denial of service (ReDoS) vulnerabilities. Several regex patterns — in vllm/lora/utils.py, the phi4mini tool parser, and the OpenAI-compatible serving chat endpoint — are susceptible to catastrophic…
- CVE-2026-54233Jun 17, 2026risk 0.00cvss —epss 0.00
### Summary vLLM's `/v1/audio/transcriptions` endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0. ### Details `SpeechToTextProcessor` rejects uploads over…
- CVE-2026-54236Jun 17, 2026risk 0.00cvss —epss 0.01
# vLLM: incomplete CVE-2026-22778 fix leaks PIL repr addresses via the Anthropic API router **Researcher:** Kai Aizen — SnailSploit (@SnailSploit), Adversarial & Offensive Security Research **Severity:** CVSS 3.1 5.3 (Medium) `AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:N/A:N` **Target:**…
- CVE-2026-53923Jun 17, 2026risk 0.00cvss —epss 0.00
## Summary Integer truncation of tensor dimensions in vLLM's GGUF dequantize kernels (`csrc/quantization/gguf/gguf_kernel.cu`) causes partial tensor processing. The output tensor is allocated at full size via `torch::empty` (uninitialized memory), but the dequantize CUDA kernel…
- CVE-2026-54235Jun 17, 2026risk 0.00cvss —epss 0.00
## Summary All temperature validation gates use comparison operators (`<`, `>`), which silently evaluate to `False` for `NaN` and for positive `Infinity` in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce…
- CVE-2026-47155Jun 10, 2026risk 0.00cvss —epss 0.00
### Summary vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies `--revision` or `--code-revision` can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository…
- CVE-2026-27893Mar 26, 2026risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit…
- CVE-2026-25960Mar 9, 2026risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). The SSRF protection fix for CVE-2026-24779 add in 0.15.1 can be bypassed in the load_from_url_async method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client.…
- CVE-2026-22778Feb 2, 2026risk 0.00cvss —epss 0.04
vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR…
- CVE-2026-24779Jan 27, 2026risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async…
- CVE-2026-22807Jan 21, 2026risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python…
- CVE-2026-22773Jan 10, 2026risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This…
- CVE-2025-66448Dec 1, 2025risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves…
- CVE-2025-62372Nov 21, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong),…
- CVE-2025-62426Nov 21, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the…
- CVE-2025-62164Nov 21, 2025risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When…
- CVE-2025-59425Oct 7, 2025risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more…
- CVE-2025-48956Aug 21, 2025risk 0.00cvss —epss 0.01
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.10.1.1, a Denial of Service (DoS) vulnerability can be triggered by sending a single HTTP GET request with an extremely large header to an HTTP endpoint. This results in server…
- CVE-2025-48944May 30, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). In version 0.8.0 up to but excluding 0.9.0, the vLLM backend used with the /v1/chat/completions OpenAPI endpoint fails to validate unexpected or malformed input in the "pattern" and "type" fields when the…
- CVE-2025-48943May 30, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). Version 0.8.0 up to but excluding 0.9.0 have a Denial of Service (ReDoS) that causes the vLLM server to crash if an invalid regex was provided while using structured output. This vulnerability is similar…
- CVE-2025-48942May 30, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). In versions 0.8.0 up to but excluding 0.9.0, hitting the /v1/completions API with a invalid json_schema as a Guided Param kills the vllm server. This vulnerability is similar…
- CVE-2025-48887May 30, 2025risk 0.00cvss —epss 0.00
vLLM, an inference and serving engine for large language models (LLMs), has a Regular Expression Denial of Service (ReDoS) vulnerability in the file `vllm/entrypoints/openai/tool_parsers/pythonic_tool_parser.py` of versions 0.6.4 up to but excluding 0.9.0. The root cause is the…
- CVE-2025-46722May 29, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). In versions starting from 0.7.0 to before 0.9.0, in the file vllm/multimodal/hasher.py, the MultiModalHasher class has a security and data integrity issue in its image hashing method. Currently, it…
- CVE-2025-46570May 29, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token).…
- CVE-2025-47277May 20, 2025risk 0.00cvss —epss 0.01
vLLM, an inference and serving engine for large language models (LLMs), has an issue in versions 0.6.5 through 0.8.4 that ONLY impacts environments using the `PyNcclPipe` KV cache transfer integration with the V0 engine. No other configurations are affected. vLLM supports the…
- CVE-2025-30165May 6, 2025risk 0.00cvss —epss 0.00
vLLM is an inference and serving engine for large language models. In a multi-node vLLM deployment using the V0 engine, vLLM uses ZeroMQ for some multi-node communication purposes. The secondary vLLM hosts open a `SUB` ZeroMQ socket and connect to an `XPUB` socket on the primary…
- CVE-2025-32444Apr 30, 2025risk 0.00cvss —epss 0.01
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.6.5 and prior to 0.8.5, having vLLM integration with mooncake, are vulnerable to remote code execution due to using pickle based serialization over unsecured ZeroMQ…
- CVE-2025-46560Apr 30, 2025risk 0.00cvss —epss 0.00
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.8.0 and prior to 0.8.5 are affected by a critical performance vulnerability in the input preprocessing logic of the multimodal tokenizer. The code dynamically replaces…
- CVE-2025-30202Apr 30, 2025risk 0.00cvss —epss 0.01
vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. Versions starting from 0.5.2 and prior to 0.8.5 are vulnerable to denial of service and data exposure via ZeroMQ on multi-node vLLM deployment. In a multi-node vLLM deployment, vLLM uses ZeroMQ…
Page 1 of 2