vLLM's Artifact Pin Decay allows pinned deployments to load unpinned code, weights, and processors
Description
Summary
vLLM's revision pinning controls do not consistently apply to all artifacts loaded for a model. A deployment that supplies --revision or --code-revision can still load dynamic code, GGUF files, image processors, retrieval side weights, or same-repository subfolder weights/config from an unpinned/default revision.
This is a supply-chain integrity issue for pinned vLLM deployments. Operators can believe they are serving a reviewed model revision while vLLM resolves behavior-affecting nested or sibling artifacts outside that reviewed revision.
Details
The expected invariant is:
> When a vLLM operator supplies a model or code revision pin, every code, config, processor, weight file, side weight, and same-repository subfolder artifact loaded as part of that model should resolve under that pin unless vLLM exposes and enforces a separate explicit pin for that artifact.
Current main was verified affected at commit 3795d7acf431980e62e738493f437ae2a51549da.
Affected source boundaries:
vllm/model_executor/models/registry.py:1045-1051and:1058-1064_try_resolve_transformers()passesrevision=model_config.revisionandtrust_remote_code=model_config.trust_remote_code, but omitscode_revision=model_config.code_revisionfor externalauto_mapdynamic module imports.vllm/model_executor/model_loader/gguf_loader.py:58-60- The direct-file GGUF form
repo/file.ggufcallshf_hub_download(repo_id=repo_id, filename=filename)without passingrevision. vllm/model_executor/models/roberta.py:203-209- BGE-M3 secondary sparse and ColBERT side weights are declared with
revision=None. vllm/model_executor/models/kimi_k25.py:111-114- Kimi-K2.5 calls
cached_get_image_processor()without passingmodel_config.revision. vllm/model_executor/models/kimi_audio.py:92-95- Kimi-Audio loads Whisper config from the
whisper-large-v3subfolder without arevisionargument. vllm/model_executor/models/kimi_audio.py:425-430- Kimi-Audio declares same-repository
whisper-large-v3secondary weights withrevision=None. vllm/model_executor/model_loader/default_loader.py:287-301- The default loader preserves
model_config.revisionfor the primary source, then consumes model-supplied secondary sources as declared.
The strongest example is Kimi-Audio: the primary moonshotai/Kimi-Audio-7B-Instruct weights preserve the configured model revision, but the same-repository whisper-large-v3 audio tower config/weights do not. A pinned Kimi-Audio deployment can therefore load the Whisper subfolder outside the audited revision.
This report does not claim a trust_remote_code=False bypass, unauthenticated RCE, or real artifact compromise. The issue is improper propagation of explicit artifact pins across supported loader paths.
Impact
Affected users are operators who pin vLLM model deployments to a reviewed Hugging Face revision for safety review, provenance, rollback, or reproducibility. The impact is that the pin does not reliably describe the full set of artifacts vLLM serves. Even when the operator selects an audited revision, vLLM can resolve behavior-affecting secondary artifacts from the repository default branch or another mutable ref.
Depending on the model path, the unpinned artifact can be dynamic model code, a GGUF file, an image processor, retrieval side weights, or the same-repository Kimi-Audio Whisper subfolder weights/config.
This breaks the operational guarantee of a pinned deployment: "serve the exact artifact set I reviewed." A later change to an unpinned secondary artifact can alter model behavior without changing the operator's configured revision, making review, rollback, incident response, and audit records unreliable.
Occurrences
vllm/model_executor/models/kimi_k25.pyL111-L114 — Kimi-K2.5 loads its image processor withcached_get_image_processor()but does not passself.ctx.model_config.revision. The processor can therefore resolve from the default repository revision even when the model deployment is pinned.vllm/model_executor/models/kimi_audio.pyL425-L430 — Kimi-Audio declares same-repositorywhisper-large-v3secondary weights withrevision=None. A pinned Kimi-Audio deployment can therefore load the Whisper audio tower weights from an unpinned/default revision.vllm/model_executor/models/kimi_audio.pyL92-L95 — Kimi-Audio loads Whisper config from the same repository'swhisper-large-v3subfolder without passing the top-level model revision. The config for this behavior-affecting subcomponent can be resolved outside the audited model revision.vllm/model_executor/models/registry.pyL1058-L1064 — The later dynamic model-class resolution repeats the same pin-decay pattern: it forwardsrevisionandtrust_remote_code, but omitscode_revision. This means an operator-provided code pin is not enforced at the dynamic module loader boundary.vllm/model_executor/model_loader/gguf_loader.pyL58-L60 — The direct GGUF formrepo/file.ggufcallshf_hub_download(repo_id=repo_id, filename=filename)without passingmodel_config.revision. A deployment that pins the model revision can therefore resolve this GGUF file from the repository default revision.vllm/model_executor/models/registry.pyL1045-L1051 —try_get_class_from_dynamic_module()is called for externalauto_mapconfig/model classes withrevision=model_config.revision, but without forwardingmodel_config.code_revision. When--code-revisionis set, this dynamic module resolution can still fall back to the default code revision instead of the audited code revision.vllm/model_executor/models/roberta.pyL203-L209 —BgeM3EmbeddingModelcreates same-repository secondary sparse/ColBERT weight sources withrevision=None. The primary model revision is not propagated to these side weights, so they can be downloaded outside the operator-selected model revision.
Fixes
This was fixed in: https://github.com/vllm-project/vllm/pull/42616
___
Originally filed via huntr: https://huntr.com/bounties/3f1e24c0-87d2-4f6c-a705-820f380879ac.
The vLLM maintainer (Russell Bryant) redirected the report to the private GHSA channel. Offline proof bundle (vllm_artifact_pin_decay_bundle_verify.py + bundle-verification-20260430T143506Z.json) is available upon request.
Affected products
1Patches
1d26a28ab0336fix: propagate revision/code_revision pins to all artifact boundaries (#42616)
6 files changed · +17 −5
tests/models/test_gguf_download.py+5 −2 modified@@ -122,14 +122,17 @@ def test_prepare_weights_repo_filename(self, mock_isfile, mock_hf_download): mock_hf_download.return_value = "/downloaded/model.gguf" - # Create a simple mock ModelConfig with only the model attribute model_config = MagicMock() model_config.model = "unsloth/Qwen3-0.6B-GGUF/model.gguf" + model_config.revision = "abc123" result = loader._prepare_weights(model_config) assert result == "/downloaded/model.gguf" mock_hf_download.assert_called_once_with( - repo_id="unsloth/Qwen3-0.6B-GGUF", filename="model.gguf" + repo_id="unsloth/Qwen3-0.6B-GGUF", + filename="model.gguf", + revision="abc123", + cache_dir=None, ) @patch("vllm.config.model.get_hf_image_processor_config", return_value=None)
vllm/model_executor/model_loader/gguf_loader.py+6 −1 modified@@ -57,7 +57,12 @@ def _prepare_weights(self, model_config: ModelConfig): # repo id/filename.gguf if "/" in model_name_or_path and model_name_or_path.endswith(".gguf"): repo_id, filename = model_name_or_path.rsplit("/", 1) - return hf_hub_download(repo_id=repo_id, filename=filename) + return hf_hub_download( + repo_id=repo_id, + filename=filename, + revision=model_config.revision, + cache_dir=self.load_config.download_dir, + ) # repo_id:quant_type elif "/" in model_name_or_path and ":" in model_name_or_path: repo_id, quant_type = model_name_or_path.rsplit(":", 1)
vllm/model_executor/models/kimi_audio.py+2 −1 modified@@ -92,6 +92,7 @@ def __init__( whisper_config = HFWhisperConfig.from_pretrained( model_path, subfolder=KIMIA_WHISPER_SUBFOLDER, + revision=vllm_config.model_config.revision, ) super().__init__( @@ -426,7 +427,7 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""): DefaultModelLoader.Source( model_or_path=vllm_config.model_config.model, subfolder="whisper-large-v3", - revision=None, + revision=vllm_config.model_config.revision, ) ]
vllm/model_executor/models/kimi_k25.py+1 −0 modified@@ -110,6 +110,7 @@ def __init__(self, ctx: InputProcessingContext) -> None: tokenizer = self.get_tokenizer() image_processor = cached_get_image_processor( self.ctx.model_config.model, + revision=self.ctx.model_config.revision, trust_remote_code=self.ctx.model_config.trust_remote_code, )
vllm/model_executor/models/registry.py+2 −0 modified@@ -1064,6 +1064,7 @@ def _try_resolve_transformers( module, model_config.model, revision=model_config.revision, + code_revision=model_config.code_revision, trust_remote_code=model_config.trust_remote_code, warn_on_fail=False, ) @@ -1077,6 +1078,7 @@ def _try_resolve_transformers( module, model_config.model, revision=model_config.revision, + code_revision=model_config.code_revision, trust_remote_code=model_config.trust_remote_code, warn_on_fail=True, )
vllm/model_executor/models/roberta.py+1 −1 modified@@ -203,7 +203,7 @@ def __init__(self, *, vllm_config: VllmConfig, prefix: str = ""): self.secondary_weights = [ DefaultModelLoader.Source( model_or_path=vllm_config.model_config.model, - revision=None, + revision=vllm_config.model_config.revision, prefix=prefix, allow_patterns_overrides=[filename], )
Vulnerability mechanics
Root cause
"vLLM fails to consistently propagate revision pins to all loaded model artifacts."
Attack vector
An attacker can exploit this by creating a model repository where critical components like dynamic code, GGUF files, image processors, or side weights are placed in a different revision than the main model weights. When a user deploys vLLM with a specific revision pin for the main model, vLLM may inadvertently load these other artifacts from an unpinned or default revision, bypassing the intended security or integrity checks [ref_id=1]. This allows for the substitution of behavior-altering components without altering the pinned revision.
Affected code
The vulnerability stems from several locations within vLLM's model loading logic. Specifically, issues are found in `vllm/model_executor/models/registry.py` concerning dynamic module imports, `vllm/model_executor/model_loader/gguf_loader.py` for GGUF file downloads, and within model-specific implementations like `vllm/model_executor/models/roberta.py`, `vllm/model_executor/models/kimi_k25.py`, and `vllm/model_executor/models/kimi_audio.py` where secondary weights, configs, or processors were loaded without explicit revision pinning [ref_id=1].
What the fix does
The patch ensures that revision and code_revision pins are consistently propagated to all artifact loading boundaries within vLLM [patch_id=5503169]. This includes explicitly passing the revision to `hf_hub_download` for GGUF files and ensuring that `code_revision` is forwarded during dynamic module resolution. By enforcing the revision pin across all artifact types, vLLM now guarantees that pinned deployments load artifacts exclusively from the specified revision, closing the supply-chain integrity gap.
Preconditions
- configThe vLLM deployment must be configured with a specific model revision pin (e.g., using `--revision` or `--code-revision`).
- inputThe model repository must contain behavior-affecting artifacts (dynamic code, GGUF, image processors, side weights, subfolder configs/weights) in a different revision than the pinned main model revision.
Generated on Jun 10, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
2News mentions
0No linked articles in our index yet.