VYPR
Medium severity6.5NVD Advisory· Published Jun 17, 2026· Updated Jun 17, 2026

vLLM: OOM Denial of Service via Audio Decompression Bomb

CVE-2026-54233

Description

### Summary vLLM's /v1/audio/transcriptions endpoint limits compressed upload size but not decoded PCM output. A 25MB OPUS file expands to ~14.9GB of float32 PCM at decode time. Tested on vLLM v0.19.0.

Details

SpeechToTextProcessor rejects uploads over VLLM_MAX_AUDIO_CLIP_FILESIZE_MB (default 25MB) based on compressed byte length, but the audio decoder in audio.py accumulates all decoded frames into memory with no size limit before returning:

# speech_to_text.py L184-189
if len(audio_data) / 1024 ** 2 > self.max_audio_filesize_mb:
    raise VLLMValidationError(...)
y, sr = load_audio(buf, sr=self.asr_config.sample_rate)  # decoded size unchecked

# audio.py L77-107
chunks: list[npt.NDArray] = []
for frame in container.decode(stream):
    chunks.append(frame.to_ndarray())
audio = np.concatenate(chunks, axis=-1).astype(np.float32)  # single contiguous allocation

A 25MB OPUS file at 6kbps encodes ~8.7 hours of audio. Decoding produces ~5.7GB of float32 PCM (232x amplification), and np.concatenate then allocates a second contiguous array, bringing peak RSS to ~14.9GB from a single request. SpeechToTextConfig.max_audio_clip_s (default 30s) applies only after the full decode and does not prevent the allocation.

Impact

An unauthenticated attacker can exhaust server memory with a small number of concurrent requests, each a valid upload within the documented size limit. Severity was assessed with reference to prior OOM vulnerability reports in vLLM.

Fix

A fix for this vulnerability was merged here: https://github.com/vllm-project/vllm/pull/44970

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Affected products

1

Patches

Vulnerability mechanics

Root cause

"Missing decoded-size limit in audio decoder allows decompression bomb amplification from compressed upload to PCM output."

Attack vector

An unauthenticated attacker sends a POST request to the `/v1/audio/transcriptions` endpoint with a small (≤25MB) OPUS audio file encoded at a very low bitrate (e.g., 6kbps). The compressed size check passes, but the decoder in `audio.py` accumulates all decoded frames into memory with no size limit, expanding the payload by a factor of ~232x. A single 25MB OPUS file can produce ~14.9GB of float32 PCM, exhausting server memory with only a few concurrent requests [ref_id=1].

Affected code

The vulnerability resides in `vllm/multimodal/media/audio.py` (the `load_audio_pyav` and `load_audio_soundfile` functions) and `vllm/entrypoints/speech_to_text/base/serving.py` (the `_preprocess_speech_to_text` method). The `SpeechToTextProcessor` in `speech_to_text.py` checks the compressed file size against `VLLM_MAX_AUDIO_CLIP_FILESIZE_MB` but does not limit the decoded PCM output, allowing a decompression bomb attack [patch_id=6351922].

What the fix does

The patch adds a `max_duration_s` parameter (default 600s via `VLLM_MAX_AUDIO_DECODE_DURATION_S`) to both `load_audio_pyav` and `load_audio_soundfile`. Before decoding, it checks container/stream metadata for duration and rejects files exceeding the limit. During decoding, it tracks accumulated sample count and raises `ValueError` once the limit is exceeded, preventing the large contiguous allocation. The `_preprocess_speech_to_text` method passes this limit from the environment variable to `load_audio` [patch_id=6351922].

Preconditions

  • networkThe attacker must be able to reach the /v1/audio/transcriptions endpoint (no authentication required).
  • inputThe attacker must upload a compressed audio file (e.g., OPUS) within the 25MB size limit that decodes to many hours of PCM.

Generated on Jun 17, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

5

News mentions

0

No linked articles in our index yet.