VYPR
Medium severity6.5GHSA Advisory· Published May 12, 2026· Updated May 15, 2026

CVE-2026-44223

CVE-2026-44223

Description

vLLM is an inference and serving engine for large language models (LLMs). From to before 0.20.0, the extract_hidden_states speculative decoding proposer in vLLM returns a tensor with an incorrect shape after the first decode step, causing a RuntimeError that crashes the EngineCore process. The crash is triggered when any request in the batch uses sampling penalty parameters (repetition_penalty, frequency_penalty, or presence_penalty). A single request with a penalty parameter (e.g., "repetition_penalty": 1.1) is sufficient to crash the server. This vulnerability is fixed in 0.20.0.

Affected products

2
  • Vllm/VllmGHSA2 versions
    >= 0.18.0, < 0.20.0+ 1 more
    • (no CPE)range: >= 0.18.0, < 0.20.0
    • cpe:2.3:a:vllm:vllm:*:*:*:*:*:*:*:*range: >=0.18.0,<0.20.0

Patches

1
edee96519a1b

[Spec Decode] fix returning size mismatch on extract hidden states proposer (#38610)

https://github.com/vllm-project/vllmzzaebokApr 9, 2026via nvd-ref
1 file changed · +4 1
  • vllm/v1/spec_decode/extract_hidden_states.py+4 1 modified
    @@ -145,7 +145,10 @@ def propose(
     
             # Return the sampled tokens as "draft" tokens
             # Shape: [batch_size, 1] to match num_speculative_tokens=1
    -        return sampled_token_ids
    +        # On decode steps with spec tokens, sampled_token_ids may have
    +        # shape [batch_size, 2] (target + spec verification); slice to
    +        # return only the target-sampled column.
    +        return sampled_token_ids[:, :1]
     
         def _get_slot_mapping(
             self,
    

Vulnerability mechanics

AI mechanics synthesis has not run for this CVE yet.

References

4

News mentions

0

No linked articles in our index yet.