VYPR
High severity7.5NVD Advisory· Published Jun 11, 2026

CVE-2026-5497

CVE-2026-5497

Description

vLLM versions 0.8.0+ are vulnerable to DoS via unbounded frame count in video/jpeg data URLs, overwhelming memory.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

vLLM versions 0.8.0+ are vulnerable to DoS via unbounded frame count in video/jpeg data URLs, overwhelming memory.

## Vulnerability vLLM versions 0.8.0 and later are vulnerable to an Out-of-Memory (OOM) Denial of Service (DoS) attack due to unbounded frame count processing in the VideoMediaIO.load_base64() method. When processing video/jpeg data URLs, the method splits the base64 data string on commas to extract individual JPEG frames without enforcing a frame count limit [1][2]. This allows an attacker to craft a single API request containing thousands of comma-separated base64-encoded JPEG frames.

Exploitation

An attacker can exploit this vulnerability by sending a crafted request to the OpenAI-compatible chat completions API without requiring authentication. The request includes a video/jpeg data URL with many comma-separated base64-encoded JPEG frames. The server then decodes all frames into memory, leading to excessive memory consumption and potential crash [1].

Impact

Successful exploitation results in an Out-of-Memory (OOM) condition, causing a denial of service. The server may crash due to memory exhaustion, leading to service unavailability for legitimate users [2]. No authentication is required to trigger the vulnerability.

Mitigation

The fix is included in commit 58ee614 which enforces a frame count limit in VideoMediaIO [1]. Users should upgrade to a patched version containing this commit. No workaround is mentioned in the available references.

AI Insight generated on Jun 11, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

2
  • Vllm/Vllmreferences2 versions
    (expand)+ 1 more
    • (no CPE)
    • (no CPE)range: >=0.8.0

Patches

1
58ee61422169

(security) Enforce frame limit in VideoMediaIO (#38636)

https://github.com/vllm-project/vllmJuan Pérez de AlgabaApr 1, 2026via nvd-ref
2 files changed · +69 10
  • tests/multimodal/media/test_video.py+61 9 modified
    @@ -239,6 +239,17 @@ def test_video_media_io_backend_env_var_fallback(monkeypatch: pytest.MonkeyPatch
             assert metadata_missing["video_backend"] == "test_video_backend_override_2"
     
     
    +def _make_jpeg_b64_frames(n: int, width: int = 8, height: int = 8) -> list[str]:
    +    """Return *n* tiny base64-encoded JPEG frames."""
    +    frames: list[str] = []
    +    for i in range(n):
    +        img = Image.new("RGB", (width, height), color=(i % 256, 0, 0))
    +        buf = io.BytesIO()
    +        img.save(buf, format="JPEG")
    +        frames.append(pybase64.b64encode(buf.getvalue()).decode("ascii"))
    +    return frames
    +
    +
     def test_load_base64_jpeg_returns_metadata():
         """Regression test: load_base64 with video/jpeg must return metadata.
     
    @@ -248,16 +259,8 @@ def test_load_base64_jpeg_returns_metadata():
         """
     
         num_test_frames = 3
    -    frame_width, frame_height = 8, 8
    -
    -    # Build a few tiny JPEG frames and base64-encode them
    -    b64_frames = []
    -    for i in range(num_test_frames):
    -        img = Image.new("RGB", (frame_width, frame_height), color=(i * 80, 0, 0))
    -        buf = io.BytesIO()
    -        img.save(buf, format="JPEG")
    -        b64_frames.append(pybase64.b64encode(buf.getvalue()).decode("ascii"))
     
    +    b64_frames = _make_jpeg_b64_frames(num_test_frames)
         data = ",".join(b64_frames)
     
         imageio = ImageMediaIO()
    @@ -287,3 +290,52 @@ def test_load_base64_jpeg_returns_metadata():
         # Default fps=1 → duration == num_frames
         assert metadata["fps"] == 1.0
         assert metadata["duration"] == float(num_test_frames)
    +
    +
    +def test_load_base64_jpeg_enforces_num_frames_limit():
    +    """Frames beyond num_frames must be truncated in the video/jpeg path.
    +
    +    Without the limit an attacker can send thousands of base64 JPEG frames
    +    in a single request and exhaust server memory (OOM).
    +    """
    +    num_frames_limit = 4
    +    sent_frames = 20
    +
    +    b64_frames = _make_jpeg_b64_frames(sent_frames)
    +    data = ",".join(b64_frames)
    +
    +    imageio = ImageMediaIO()
    +    videoio = VideoMediaIO(imageio, num_frames=num_frames_limit)
    +    frames, metadata = videoio.load_base64("video/jpeg", data)
    +
    +    assert frames.shape[0] == num_frames_limit
    +    assert metadata["total_num_frames"] == num_frames_limit
    +    assert metadata["frames_indices"] == list(range(num_frames_limit))
    +
    +
    +def test_load_base64_jpeg_no_limit_when_num_frames_negative():
    +    """When num_frames is -1, all frames should be loaded without truncation."""
    +    sent_frames = 10
    +
    +    b64_frames = _make_jpeg_b64_frames(sent_frames)
    +    data = ",".join(b64_frames)
    +
    +    imageio = ImageMediaIO()
    +    videoio = VideoMediaIO(imageio, num_frames=-1)
    +    frames, metadata = videoio.load_base64("video/jpeg", data)
    +
    +    assert frames.shape[0] == sent_frames
    +    assert metadata["total_num_frames"] == sent_frames
    +    assert metadata["frames_indices"] == list(range(sent_frames))
    +
    +
    +def test_load_base64_jpeg_raises_on_zero_num_frames():
    +    """num_frames=0 is invalid and should raise ValueError."""
    +    b64_frames = _make_jpeg_b64_frames(3)
    +    data = ",".join(b64_frames)
    +
    +    imageio = ImageMediaIO()
    +    videoio = VideoMediaIO(imageio, num_frames=0)
    +
    +    with pytest.raises(ValueError, match="num_frames must be greater than 0 or -1"):
    +        videoio.load_base64("video/jpeg", data)
    
  • vllm/multimodal/media/video.py+8 1 modified
    @@ -80,8 +80,15 @@ def load_base64(
                     "image/jpeg",
                 )
     
    +            if self.num_frames > 0:
    +                frame_parts = data.split(",", self.num_frames)[: self.num_frames]
    +            elif self.num_frames == 0:
    +                raise ValueError("num_frames must be greater than 0 or -1")
    +            else:
    +                frame_parts = data.split(",")
    +
                 frames = np.stack(
    -                [np.asarray(load_frame(frame_data)) for frame_data in data.split(",")]
    +                [np.asarray(load_frame(frame_data)) for frame_data in frame_parts]
                 )
                 total = int(frames.shape[0])
                 fps = float(self.kwargs.get("fps", 1))
    

Vulnerability mechanics

Root cause

"Missing frame count limit in VideoMediaIO.load_base64() allows unbounded memory allocation from attacker-supplied base64 JPEG frames."

Attack vector

An unauthenticated attacker sends a single request to the OpenAI-compatible chat completions API with a `video/jpeg` data URL containing thousands of comma-separated base64-encoded JPEG frames. The server decodes every frame into memory without enforcing a frame count limit, causing excessive memory consumption and an Out-of-Memory (OOM) crash. This is a classic resource-exhaustion denial-of-service attack reachable over the network with no authentication required.

Affected code

The vulnerability resides in `vllm/multimodal/media/video.py` in the `VideoMediaIO.load_base64()` method. When processing `video/jpeg` data URLs, the method previously called `data.split(",")` without any limit, allowing an attacker to supply an unbounded number of comma-separated base64-encoded JPEG frames. The patch introduces a frame count limit via `data.split(",", self.num_frames)[:self.num_frames]` when `num_frames > 0`, and raises `ValueError` for `num_frames == 0`.

What the fix does

The patch modifies `VideoMediaIO.load_base64()` in `vllm/multimodal/media/video.py` to limit the number of frames decoded. When `self.num_frames` is positive, the code now calls `data.split(",", self.num_frames)[:self.num_frames]` to truncate the input to at most `num_frames` frames before decoding. A `ValueError` is raised if `num_frames` is zero, and the original unbounded behavior is preserved only when `num_frames` is `-1`. This prevents an attacker from exhausting server memory by sending an arbitrarily large number of frames.

Preconditions

  • configThe server must expose the OpenAI-compatible chat completions API endpoint that accepts video/jpeg data URLs.
  • authNo authentication or prior access is required.
  • networkThe attacker must be able to send HTTP requests to the vLLM server over the network.
  • inputThe attacker crafts a data URL with thousands of comma-separated base64-encoded JPEG frames.

Generated on Jun 11, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

2

News mentions

0

No linked articles in our index yet.