VYPR
Low severityNVD Advisory· Published May 29, 2025· Updated May 29, 2025

vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel

CVE-2025-46570

Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Affected packages

Versions sourced from the GitHub Security Advisory.

PackageAffected versionsPatched versions
vllmPyPI
< 0.9.00.9.0

Affected products

6

Patches

Vulnerability mechanics

References

6

News mentions

0

No linked articles in our index yet.