vLLM: temperature=NaN and temperature=Infinity bypass validation and propagate to GPU kernels
Description
Summary
All temperature validation gates use comparison operators (<, >), which silently evaluate to False for NaN and for positive Infinity in Python's IEEE 754 float semantics. Both values pass every guard and propagate to GPU sampling kernels, where they produce undefined behavior or CUDA errors that can crash the inference worker. Note: -Infinity is correctly caught.
Root
Cause
sampling_params.py:384: ``python if 0 < self.temperature < _MAX_TEMP: # NaN → False; +Inf → False ``
sampling_params.py:462: ``python if self.temperature < 0.0: # NaN → False; +Inf → False raise VLLMValidationError(...) ``
No math.isnan() or math.isinf() check exists anywhere in sampling_params.py.
Python semantics (verified): float('nan') < 0.0 → False, float('inf') < 0.0 → False.
Impact
Crash of inference worker on GPU kernel execution with NaN/Inf softmax input, degrading service for all concurrent users.
Remediation
Add math.isfinite(self.temperature) check in _verify_args(). Reject non-finite float values with a 400 error.
Fix
A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/45116
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Affected products
1Patches
Vulnerability mechanics
Root cause
"Missing `math.isfinite()` validation allows NaN and Infinity float values to bypass comparison-based temperature and repetition_penalty guards in `sampling_params.py`."
Attack vector
An attacker submits an inference request with `temperature=NaN` or `temperature=Inf` (or `repetition_penalty=NaN`/`Inf`) via the API. Because Python's comparison operators (`<`, `>`) always return `False` when either operand is `NaN`, and `float('inf') < 0.0` is also `False`, all validation gates are bypassed. The non-finite value propagates to GPU sampling kernels, where it causes undefined behavior or CUDA errors, crashing the inference worker and degrading service for all concurrent users [ref_id=1].
Affected code
The vulnerability resides in `vllm/sampling_params.py` within the `_verify_args()` method. The temperature validation at line 384 (`if 0 < self.temperature < _MAX_TEMP`) and line 462 (`if self.temperature < 0.0`) uses comparison operators that silently evaluate to `False` for `NaN` and positive `Infinity` due to Python's IEEE 754 float semantics. No `math.isnan()` or `math.isinf()` check existed anywhere in `sampling_params.py` before the patch [patch_id=6351925].
What the fix does
The patch adds `math.isfinite()` checks for both `temperature` and `repetition_penalty` in `_verify_args()` before any comparison-based validation [patch_id=6351925]. If the value is not finite (i.e., `NaN` or `Infinity`), a `VLLMValidationError` (for temperature) or `ValueError` (for repetition_penalty) is raised immediately, returning a 400 error to the client. This prevents non-finite floats from ever reaching GPU kernels. The accompanying test suite (`tests/samplers/test_non_finite_params.py`) verifies that `NaN`, `+Inf`, and `-Inf` are all rejected while finite values are accepted [ref_id=1].
Preconditions
- networkThe attacker must be able to send HTTP requests to the vLLM inference API endpoint that accepts SamplingParams.
- inputThe attacker must set the `temperature` or `repetition_penalty` parameter to a non-finite float value (NaN, Inf).
Generated on Jun 17, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.