PyPI package
vllm
pkg:pypi/vllm
Vulnerabilities (40)
| CVE | Sev | CVSS | KEV | Affected versions | Fixed in | Published | Description |
|---|---|---|---|---|---|---|---|
| CVE-2026-44223 | Med | 6.5 | >= 0.18.0, < 0.20.0 | 0.20.0 | May 12, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCo | |
| CVE-2026-44222 | Med | 6.5 | >= 0.6.1, < 0.20.0 | 0.20.0 | May 12, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and vi | |
| CVE-2026-7141 | Med | 5.6 | < 0.19.1 | 0.19.1 | Apr 27, 2026 | A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack re | |
| CVE-2026-34756 | Med | 6.5 | >= 0.1.0, < 0.19.0 | 0.19.0 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest a | |
| CVE-2026-34755 | Med | 6.5 | >= 0.7.0, < 0.19.0 | 0.19.0 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame coun | |
| CVE-2026-34753 | Med | 5.4 | >= 0.16.0, < 0.19.0 | 0.19.0 | Apr 6, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HT | |
| CVE-2026-27893 | — | >= 0.10.1, < 0.18.0 | 0.18.0 | Mar 26, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit `--trust-remote-code=False | ||
| CVE-2026-25960 | — | >= 0.15.1, < 0.17.0 | 0.17.0 | Mar 9, 2026 | vLLM is an inference and serving engine for large language models (LLMs). The SSRF protection fix for CVE-2026-24779 add in 0.15.1 can be bypassed in the load_from_url_async method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client. T | ||
| CVE-2026-22778 | — | >= 0.8.3, < 0.14.1 | 0.14.1 | Feb 2, 2026 | vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR f | ||
| CVE-2026-24779 | — | < 0.14.1 | 0.14.1 | Jan 27, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async m | ||
| CVE-2026-22807 | — | >= 0.10.1, < 0.14.0 | 0.14.0 | Jan 21, 2026 | vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python | ||
| CVE-2026-22773 | — | >= 0.6.4, < 0.12.0 | 0.12.0 | Jan 10, 2026 | vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This caus | ||
| CVE-2025-66448 | — | < 0.11.1 | 0.11.1 | Dec 1, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves t | ||
| CVE-2025-62372 | — | >= 0.5.5, < 0.11.1 | 0.11.1 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong), | ||
| CVE-2025-62426 | — | >= 0.5.5, < 0.11.1 | 0.11.1 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat | ||
| CVE-2025-62164 | — | >= 0.10.2, < 0.11.1 | 0.11.1 | Nov 21, 2025 | vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When p | ||
| CVE-2025-61620 | med | — | >= 0.5.1, < 0.11.0 | 0.11.0 | Oct 7, 2025 | ### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these parameter | |
| CVE-2025-6242 | Hig | 7.1 | >= 0.5.0, < 0.11.0 | 0.11.0 | Oct 7, 2025 | A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target ho | |
| CVE-2025-59425 | — | < 0.11.0 | 0.11.0 | Oct 7, 2025 | vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more charac | ||
| CVE-2025-9141 | hig | — | >= 0.10.0, < 0.10.1.1 | 0.10.1.1 | Aug 21, 2025 | ### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-proje |
- affected >= 0.18.0, < 0.20.0fixed 0.20.0
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCo
- affected >= 0.6.1, < 0.20.0fixed 0.20.0
vLLM is an inference and serving engine for large language models (LLMs). From 0.6.1 to before 0.20.0, there is a a Token Injection vulnerability in vLLM’s multimodal processing. Unauthenticated, text-only prompts that spell special tokens are interpreted as control. Image and vi
- affected < 0.19.1fixed 0.19.1
A vulnerability was found in vllm up to 0.19.0. The affected element is the function has_mamba_layers of the file vllm/v1/kv_cache_interface.py of the component KV Block Handler. Performing a manipulation results in uninitialized resource. It is possible to initiate the attack re
- affected >= 0.1.0, < 0.19.0fixed 0.19.0
vLLM is an inference and serving engine for large language models (LLMs). From 0.1.0 to before 0.19.0, a Denial of Service vulnerability exists in the vLLM OpenAI-compatible API server. Due to the lack of an upper bound validation on the n parameter in the ChatCompletionRequest a
- affected >= 0.7.0, < 0.19.0fixed 0.19.0
vLLM is an inference and serving engine for large language models (LLMs). From 0.7.0 to before 0.19.0, the VideoMediaIO.load_base64() method at vllm/multimodal/media/video.py splits video/jpeg data URLs by comma to extract individual JPEG frames, but does not enforce a frame coun
- affected >= 0.16.0, < 0.19.0fixed 0.19.0
vLLM is an inference and serving engine for large language models (LLMs). From 0.16.0 to before 0.19.0, a server-side request forgery (SSRF) vulnerability in download_bytes_from_url allows any actor who can control batch input JSON to make the vLLM batch runner issue arbitrary HT
- CVE-2026-27893Mar 26, 2026affected >= 0.10.1, < 0.18.0fixed 0.18.0
vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.18.0, two model implementation files hardcode `trust_remote_code=True` when loading sub-components, bypassing the user's explicit `--trust-remote-code=False
- CVE-2026-25960Mar 9, 2026affected >= 0.15.1, < 0.17.0fixed 0.17.0
vLLM is an inference and serving engine for large language models (LLMs). The SSRF protection fix for CVE-2026-24779 add in 0.15.1 can be bypassed in the load_from_url_async method due to inconsistent URL parsing behavior between the validation layer and the actual HTTP client. T
- CVE-2026-22778Feb 2, 2026affected >= 0.8.3, < 0.14.1fixed 0.14.1
vLLM is an inference and serving engine for large language models (LLMs). From 0.8.3 to before 0.14.1, when an invalid image is sent to vLLM's multimodal endpoint, PIL throws an error. vLLM returns this error to the client, leaking a heap address. With this leak, we reduce ASLR f
- CVE-2026-24779Jan 27, 2026affected < 0.14.1fixed 0.14.1
vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.14.1, a Server-Side Request Forgery (SSRF) vulnerability exists in the `MediaConnector` class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async m
- CVE-2026-22807Jan 21, 2026affected >= 0.10.1, < 0.14.0fixed 0.14.0
vLLM is an inference and serving engine for large language models (LLMs). Starting in version 0.10.1 and prior to version 0.14.0, vLLM loads Hugging Face `auto_map` dynamic modules during model resolution without gating on `trust_remote_code`, allowing attacker-controlled Python
- CVE-2026-22773Jan 10, 2026affected >= 0.6.4, < 0.12.0fixed 0.12.0
vLLM is an inference and serving engine for large language models (LLMs). In versions from 0.6.4 to before 0.12.0, users can crash the vLLM engine serving multimodal models that use the Idefics3 vision model implementation by sending a specially crafted 1x1 pixel image. This caus
- CVE-2025-66448Dec 1, 2025affected < 0.11.1fixed 0.11.1
vLLM is an inference and serving engine for large language models (LLMs). Prior to 0.11.1, vllm has a critical remote code execution vector in a config class named Nemotron_Nano_VL_Config. When vllm loads a model config that contains an auto_map entry, the config class resolves t
- CVE-2025-62372Nov 21, 2025affected >= 0.5.5, < 0.11.1fixed 0.11.1
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, users can crash the vLLM engine serving multimodal models by passing multimodal embedding inputs with correct ndim but incorrect shape (e.g. hidden dimension is wrong),
- CVE-2025-62426Nov 21, 2025affected >= 0.5.5, < 0.11.1fixed 0.11.1
vLLM is an inference and serving engine for large language models (LLMs). From version 0.5.5 to before 0.11.1, the /v1/chat/completions and /tokenize endpoints allow a chat_template_kwargs request parameter that is used in the code before it is properly validated against the chat
- CVE-2025-62164Nov 21, 2025affected >= 0.10.2, < 0.11.1fixed 0.11.1
vLLM is an inference and serving engine for large language models (LLMs). From versions 0.10.2 to before 0.11.1, a memory corruption vulnerability could lead to a crash (denial-of-service) and potentially remote code execution (RCE), exists in the Completions API endpoint. When p
- affected >= 0.5.1, < 0.11.0fixed 0.11.0
### Summary A resource-exhaustion (denial-of-service) vulnerability exists in multiple endpoints of the OpenAI-Compatible Server due to the ability to specify Jinja templates via the `chat_template` and `chat_template_kwargs` parameters. If an attacker can supply these parameter
- affected >= 0.5.0, < 0.11.0fixed 0.11.0
A Server-Side Request Forgery (SSRF) vulnerability exists in the MediaConnector class within the vLLM project's multimodal feature set. The load_from_url and load_from_url_async methods fetch and process media from user-provided URLs without adequate restrictions on the target ho
- CVE-2025-59425Oct 7, 2025affected < 0.11.0fixed 0.11.0
vLLM is an inference and serving engine for large language models (LLMs). Before version 0.11.0rc2, the API key support in vLLM performs validation using a method that was vulnerable to a timing attack. API key validation uses a string comparison that takes longer the more charac
- affected >= 0.10.0, < 0.10.1.1fixed 0.10.1.1
### Summary An unsafe deserialization vulnerability allows any authenticated user to execute arbitrary code on the server if they are able to get the model to pass the code as an argument to a tool call. ### Details vLLM's [Qwen3 Coder tool parser](https://github.com/vllm-proje
Page 1 of 2