CVE-2026-44223
Description
vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Affected packages
Versions sourced from the GitHub Security Advisory.
| Package | Affected versions | Patched versions |
|---|---|---|
vllmPyPI | >= 0.18.0, < 0.20.0 | 0.20.0 |
Affected products
5- osv-coords3 versions
< 0.18.1-r2+ 2 more
- (no CPE)range: < 0.18.1-r2
- (no CPE)range: < 0.19.0-r0
- (no CPE)range: >= 0.18.0, < 0.20.0
Patches
Vulnerability mechanics
References
5- github.com/vllm-project/vllm/pull/38610nvdIssue TrackingPatchWEB
- github.com/advisories/GHSA-83vm-p52w-f9pwghsaADVISORY
- github.com/vllm-project/vllm/security/advisories/GHSA-83vm-p52w-f9pwnvdMitigationVendor AdvisoryWEB
- nvd.nist.gov/vuln/detail/CVE-2026-44223ghsaADVISORY
- github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2026-145.yamlghsaWEB
News mentions
0No linked articles in our index yet.