Low severityNVD Advisory· Published May 29, 2025· Updated May 29, 2025

vLLM’s Chunk-Based Prefix Caching Vulnerable to Potential Timing Side-Channel

CVE-2025-46570

Description

vLLM is an inference and serving engine for large language models (LLMs). Prior to version 0.9.0, when a new prompt is processed, if the PageAttention mechanism finds a matching prefix chunk, the prefill process speeds up, which is reflected in the TTFT (Time to First Token). These timing differences caused by matching chunks are significant enough to be recognized and exploited. This issue has been patched in version 0.9.0.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Affected packages

Versions sourced from the GitHub Security Advisory.

Package	Affected versions	Patched versions
vllmPyPI	< 0.9.0	0.9.0

Affected products

osv-coords5 versions
pkg:apk/chainguard/py3.10-vllm-cuda-11.8 pkg:apk/chainguard/py3.10-vllm-cuda-12.6 pkg:apk/chainguard/py3.10-wheels-vllm-cuda-11.8 pkg:apk/chainguard/tritonserver-backend-vllm-24.04 pkg:pypi/vllm
< 0.9.0-r0+ 4 more
- (no CPE)range: < 0.9.0-r0
- (no CPE)range: < 0.9.0.1-r0
- (no CPE)range: < 0.9.0-r0
- (no CPE)range: < 24.04-r10
- (no CPE)range: < 0.9.0
Vllm/Vllmv5
Range: < 0.9.0

Patches

Vulnerability mechanics

References

github.com/advisories/GHSA-4qjh-9fv9-r85rghsaADVISORY
nvd.nist.gov/vuln/detail/CVE-2025-46570ghsaADVISORY
github.com/pypa/advisory-database/tree/main/vulns/vllm/PYSEC-2025-53.yamlghsaWEB
github.com/vllm-project/vllm/commit/77073c77bc2006eb80ea6d5128f076f5e6c6f54fghsax_refsource_MISCWEB
github.com/vllm-project/vllm/pull/17045ghsax_refsource_MISCWEB
github.com/vllm-project/vllm/security/advisories/GHSA-4qjh-9fv9-r85rghsax_refsource_CONFIRMWEB

News mentions

No linked articles in our index yet.

cvss	0.065
epss	0.000
exploit	0.000
kev	0.000
patch	-0.070
ransomware	0.000