CVE-2026-10775
Description
SGLang's cache handler has hash collision vulnerabilities allowing local denial of service and data corruption.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
SGLang's cache handler has hash collision vulnerabilities allowing local denial of service and data corruption.
Vulnerability
A vulnerability exists in the data_hash function within the Cache Handler component of sgl-project SGLang up to version 0.5.11. This issue stems from hash stability and collision problems in the multimodal embedding cache key path, specifically the truncation of SHA256 to 64 bits in data_hash, and other related hashing functions like tensor_hash and hash_feature which do not properly account for tensor boundaries, shape, or dtype, and the use of unstable Python hash() in combine_hashes. These issues can lead to silent embedding corruption, cache invalidation across processes, or request crashes [2].
Exploitation
An attacker with local execution privileges can exploit this vulnerability by triggering hash collisions within the multimodal embedding cache. This can be achieved by crafting specific multimodal inputs that result in identical cache keys despite representing different data. The exploitation requires a high degree of complexity and is considered difficult, but the exploit has been publicly disclosed [2].
Impact
Successful exploitation can lead to a denial of service by causing request crashes or cache invalidation. Additionally, it can result in silent incorrect outputs due to cache poisoning, where a request receives an embedding computed for a different multimodal input. The scope of the compromise is limited to the local execution environment [2].
Mitigation
A pull request has been submitted to address these issues by extending data_hash to 128 bits, adding shape/dtype metadata to hashing, replacing unstable Python hash() with SHA256, and fixing type errors. The pull request is awaiting acceptance, and no patched version has been released as of the available references [3]. The vulnerability persists in the latest version according to one reference [2].
AI Insight generated on Jun 3, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2(expand)+ 1 more
- (no CPE)
- (no CPE)range: <=0.5.11
Patches
1127b9e3283f7[BugFix]: Fix DeepSeek V4 HiCache layer count logic (#25477)
3 files changed · +161 −144
python/sglang/srt/mem_cache/hybrid_cache/hybrid_pool_assembler.py+6 −3 modified@@ -283,7 +283,8 @@ def build_deepseek_v4_hicache_stack( pp_size: int = 1, enable_storage_metrics: bool = False, ) -> tuple[HostPoolGroup, HybridCacheController]: - transfer_layer_num = len(kvcache.compression_ratios) + # TODO(hzh0425): Support PP for deepseek v4 with hicache + transfer_layer_num = kvcache.end_layer - kvcache.start_layer full_layer_mapping = {layer_id: layer_id for layer_id in range(transfer_layer_num)} swa_layer_mapping = { layer_id: layer_id for layer_id in range(len(kvcache.swa_kv_pool.kv_buffer)) @@ -293,7 +294,9 @@ def build_deepseek_v4_hicache_stack( c128_layer_mapping = {} c4_state_global_layers = [] c128_state_global_layers = [] - for layer_id, layer_item in enumerate(kvcache.layer_mapping): + for layer_id, layer_item in enumerate( + kvcache.layer_mapping[kvcache.start_layer : kvcache.end_layer] + ): if layer_item.compress_ratio == 4: c4_layer_mapping[layer_id] = layer_item.compress_layer_id c4_state_global_layers.append(layer_id) @@ -730,7 +733,7 @@ def attach_hybrid_pool_to_unified_cache( indices_from_pool=indices_from_pool, ) ) - transfer_layer_num = len(kvcache.compression_ratios) + transfer_layer_num = kvcache.end_layer - kvcache.start_layer elif mamba_stack: full_layer_mapping = dict(kvcache.full_attention_layer_id_mapping) mamba_layer_mapping = dict(params.req_to_token_pool.mamba_map)
test/registered/radix_cache/test_unified_radix_cache_kl_hicache_nightly.py+0 −141 renamed@@ -13,162 +13,21 @@ from urllib.parse import urlparse import requests -from test_unified_radix_cache_kl import UnifiedRadixTreeTestMixin from sglang.srt.utils import kill_process_tree from sglang.test.ci.ci_register import register_cuda_ci -from sglang.test.kl_multiturn_utils import ( - get_input_ids, - make_mamba_decode_assert, - make_mamba_prefill_assert, -) from sglang.test.test_utils import ( - DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, DEFAULT_URL_FOR_TEST, CustomTestCase, popen_launch_server, ) -MAMBA_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct-FP8" -MAMBA_CHUNK_SIZE = 64 -MAMBA_TRACK_INTERVAL = 128 - -DSV4_FLASH_MODEL = "sgl-project/DeepSeek-V4-Flash-FP8" -DSV4_FLASH_LAUNCH_TIMEOUT = 3600 - -DSV32_MODEL = "deepseek-ai/DeepSeek-V3.2" -DSV32_LAUNCH_TIMEOUT = 3600 - GLM5_MODEL = "zai-org/GLM-5.1-FP8" GLM5_LAUNCH_TIMEOUT = 3600 register_cuda_ci(est_time=900, suite="nightly-8-gpu-h200", nightly=True) -class TestUnifiedMambaHiCache(UnifiedRadixTreeTestMixin, CustomTestCase): - """Mamba hybrid + HiCache + UnifiedRadixCache.""" - - kl_threshold = 0.003 - prefill_cache_assert = staticmethod( - make_mamba_prefill_assert(chunk_size=MAMBA_CHUNK_SIZE) - ) - decode_cache_assert = staticmethod( - make_mamba_decode_assert(track_interval=MAMBA_TRACK_INTERVAL) - ) - - @classmethod - def setUpClass(cls): - cls.model = MAMBA_MODEL - cls.base_url = DEFAULT_URL_FOR_TEST - cls.process = popen_launch_server( - cls.model, - cls.base_url, - timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, - other_args=[ - "--tp-size", - "4", - "--chunked-prefill-size", - "2048", - "--mem-fraction-static", - "0.85", - "--mamba-scheduler-strategy", - "extra_buffer", - "--mamba-track-interval", - str(MAMBA_TRACK_INTERVAL), - "--enable-hierarchical-cache", - "--hicache-ratio", - "4", - "--hicache-write-policy", - "write_through", - "--hicache-io-backend", - "direct", - "--hicache-mem-layout", - "page_first_direct", - "--max-total-tokens", - "12000", - "--max-mamba-cache-size", - "500", - "--max-running-requests", - "4", - ], - env={"SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1"}, - ) - cls.input_ids = get_input_ids(cls.model, num_samples=18) - - @classmethod - def tearDownClass(cls): - kill_process_tree(cls.process.pid) - - -def _assert_dsv4_decode_cached_tokens(result, history_len, output_len, label): - expected = history_len + output_len - actual = result["meta_info"]["cached_tokens"] - lower = max(0, expected - 256) - assert actual >= lower, f"{label}: expected cached_tokens>={lower}, got {actual}" - - -class TestUnifiedDeepSeekV4FlashHiCache(UnifiedRadixTreeTestMixin, CustomTestCase): - """DeepSeek V4 Flash FP8 + HiCache + UnifiedRadixCache.""" - - kl_threshold = 0.0035 - sampling_temperature = 0 - decode_cache_assert = staticmethod(_assert_dsv4_decode_cached_tokens) - gsm8k_threshold = 0.90 - num_gsm8k_questions = 100 - - @unittest.skip("no stable.") - def test_multiturn_logprobs_match(self): - pass - - @classmethod - def setUpClass(cls): - cls.model = DSV4_FLASH_MODEL - cls.base_url = DEFAULT_URL_FOR_TEST - cls.process = popen_launch_server( - cls.model, - cls.base_url, - timeout=DSV4_FLASH_LAUNCH_TIMEOUT, - other_args=[ - "--trust-remote-code", - "--tp-size", - "4", - "--attention-backend", - "compressed", - "--page-size", - "256", - "--chunked-prefill-size", - "8192", - "--mem-fraction-static", - "0.9", - "--disable-shared-experts-fusion", - "--enable-hierarchical-cache", - "--hicache-ratio", - "4", - "--hicache-write-policy", - "write_through", - "--hicache-io-backend", - "direct", - "--hicache-mem-layout", - "page_first_direct", - "--swa-full-tokens-ratio", - "0.25", - "--max-total-tokens", - "20000", - "--max-running-requests", - "2", - ], - env={ - "SGLANG_DSV4_FP4_EXPERTS": "0", - "SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1", - }, - ) - cls.input_ids = get_input_ids(cls.model, num_samples=18) - - @classmethod - def tearDownClass(cls): - kill_process_tree(cls.process.pid) - - class GSM8KTwoPassMixin: """Mixin: run GSM8K twice with flush in between, verify accuracy diff.
test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py+155 −0 added@@ -0,0 +1,155 @@ +import unittest + +from test_unified_radix_cache_kl import UnifiedRadixTreeTestMixin + +from sglang.srt.utils import kill_process_tree +from sglang.test.ci.ci_register import register_cuda_ci +from sglang.test.kl_multiturn_utils import ( + get_input_ids, + make_mamba_decode_assert, + make_mamba_prefill_assert, +) +from sglang.test.test_utils import ( + DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, + DEFAULT_URL_FOR_TEST, + CustomTestCase, + is_in_ci, + popen_launch_server, +) + +MAMBA_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct" +MAMBA_CHUNK_SIZE = 64 +MAMBA_TRACK_INTERVAL = 128 + +DSV4_FLASH_MODEL = "sgl-project/DeepSeek-V4-Flash-FP8" +DSV4_FLASH_LAUNCH_TIMEOUT = 3600 + +register_cuda_ci(est_time=768, stage="base-c", runner_config="8-gpu-h200") + + +class TestUnifiedMambaHiCache(UnifiedRadixTreeTestMixin, CustomTestCase): + """Mamba hybrid + HiCache + UnifiedRadixCache.""" + + kl_threshold = 0.005 + prefill_cache_assert = staticmethod( + make_mamba_prefill_assert(chunk_size=MAMBA_CHUNK_SIZE) + ) + decode_cache_assert = staticmethod( + make_mamba_decode_assert(track_interval=MAMBA_TRACK_INTERVAL) + ) + + @classmethod + def setUpClass(cls): + cls.model = MAMBA_MODEL + cls.base_url = DEFAULT_URL_FOR_TEST + cls.process = popen_launch_server( + cls.model, + cls.base_url, + timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH, + other_args=[ + "--tp-size", + "4", + "--chunked-prefill-size", + "2048", + "--mem-fraction-static", + "0.85", + "--mamba-scheduler-strategy", + "extra_buffer", + "--mamba-track-interval", + str(MAMBA_TRACK_INTERVAL), + "--enable-hierarchical-cache", + "--hicache-ratio", + "4", + "--hicache-write-policy", + "write_through", + "--hicache-io-backend", + "direct", + "--hicache-mem-layout", + "page_first_direct", + "--max-total-tokens", + "12000", + "--max-mamba-cache-size", + "500", + "--max-running-requests", + "4", + ], + env={"SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1"}, + ) + cls.input_ids = get_input_ids(cls.model, num_samples=18) + + @classmethod + def tearDownClass(cls): + kill_process_tree(cls.process.pid) + + +def _assert_dsv4_decode_cached_tokens(result, history_len, output_len, label): + expected = history_len + output_len + actual = result["meta_info"]["cached_tokens"] + lower = max(0, expected - 256) + assert actual >= lower, f"{label}: expected cached_tokens>={lower}, got {actual}" + + +class TestUnifiedDeepSeekV4FlashHiCache(UnifiedRadixTreeTestMixin, CustomTestCase): + """DeepSeek V4 Flash FP8 + HiCache + UnifiedRadixCache.""" + + kl_threshold = 0.005 + sampling_temperature = 0 + decode_cache_assert = staticmethod(_assert_dsv4_decode_cached_tokens) + gsm8k_threshold = 0.90 + num_gsm8k_questions = 100 + + @unittest.skipIf(is_in_ci(), "To reduce the CI execution time.") + def test_multiturn_logprobs_match(self): + pass + + @classmethod + def setUpClass(cls): + cls.model = DSV4_FLASH_MODEL + cls.base_url = DEFAULT_URL_FOR_TEST + cls.process = popen_launch_server( + cls.model, + cls.base_url, + timeout=DSV4_FLASH_LAUNCH_TIMEOUT, + other_args=[ + "--trust-remote-code", + "--tp-size", + "4", + "--attention-backend", + "compressed", + "--page-size", + "256", + "--chunked-prefill-size", + "8192", + "--mem-fraction-static", + "0.9", + "--disable-shared-experts-fusion", + "--enable-hierarchical-cache", + "--hicache-ratio", + "4", + "--hicache-write-policy", + "write_through", + "--hicache-io-backend", + "direct", + "--hicache-mem-layout", + "page_first_direct", + "--swa-full-tokens-ratio", + "0.25", + "--max-total-tokens", + "20000", + "--max-running-requests", + "2", + ], + env={ + "SGLANG_DSV4_FP4_EXPERTS": "0", + "SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1", + }, + ) + cls.input_ids = get_input_ids(cls.model, num_samples=18) + + @classmethod + def tearDownClass(cls): + kill_process_tree(cls.process.pid) + + +if __name__ == "__main__": + unittest.main()
Vulnerability mechanics
Root cause
"Hash collisions in multimodal cache keys lead to incorrect data retrieval and potential crashes."
Attack vector
An attacker with local execution privileges can trigger this vulnerability by providing specially crafted multimodal inputs. These inputs exploit weaknesses in how the system hashes tensor data and combines hash values. Specifically, the `data_hash` function truncates SHA256 hashes, and `tensor_hash` does not encode tensor boundaries, leading to collisions. Additionally, the use of Python's built-in `hash()` for combining hashes is not stable across processes. This can result in cache poisoning or silent output corruption, and in some cases, a denial of service due to a `TypeError` [ref_id=1].
Affected code
The vulnerability resides in the multimodal cache handling logic, specifically within the `data_hash`, `tensor_hash`, and `hash_feature` functions in `python/sglang/srt/managers/mm_utils.py`, and `combine_hashes` in `python/sglang/srt/mem_cache/multimodal_cache.py`. The `precomputed_embedding` handling in `python/sglang/srt/multimodal/processors/base_processor.py` is also affected. The fix is located in `python/sglang/srt/mem_cache/hybrid_cache/hybrid_pool_assembler.py` [ref_id=1].
What the fix does
The patch modifies the `build_deepseek_v4_hicache_stack` function to correctly calculate the number of transfer layers. Previously, it incorrectly used the length of `kvcache.compression_ratios`, which did not account for the `start_layer` and `end_layer` parameters. The fix ensures that `transfer_layer_num` accurately reflects the relevant layers for DeepSeek V4 models with HiCache, preventing potential misconfigurations that could lead to issues like cache collisions or denial of service [patch_id=4719042].
Preconditions
- inputSpecially crafted multimodal inputs that exploit hash stability and collision issues.
- authLocal execution privileges.
Reproduction
The following minimal script demonstrates the deterministic collision and crash cases without requiring a model server: ```python import hashlib import numpy as np import torch
def data_hash(data) -> int: hash_bytes = hashlib.sha256(data).digest()[:8] return int.from_bytes(hash_bytes, byteorder="big", signed=False)
def flatten_nested_list(nested_list): ret = [] for item in nested_list: if isinstance(item, list): ret.extend(flatten_nested_list(item)) else: ret.append(item) return ret
def tensor_hash(tensor_list) -> int: if isinstance(tensor_list, list): tensor_list = flatten_nested_list(tensor_list) tensor_list = [x.flatten() for x in tensor_list] tensor = torch.concat(tensor_list) else: tensor = tensor_list return data_hash(tensor.numpy().tobytes())
def hash_feature(f): if isinstance(f, list): if isinstance(f[0], torch.Tensor): return tensor_hash(f) return data_hash(tuple(flatten_nested_list(f))) elif isinstance(f, np.ndarray): arr = np.ascontiguousarray(f) return data_hash(arr.tobytes()) elif isinstance(f, torch.Tensor): return tensor_hash([f]) return data_hash(f)
# 1. Tensor boundary loss: same bytes, different tensor partitions. a = [torch.tensor([1.0, 2.0]), torch.tensor([3.0, 4.0, 5.0])] b = [torch.tensor([1.0, 2.0, 3.0]), torch.tensor([4.0, 5.0])] print("tensor_hash(a):", tensor_hash(a)) print("tensor_hash(b):", tensor_hash(b)) assert tensor_hash(a) == tensor_hash(b)
# 2. ndarray shape loss: same bytes, different shapes. data = np.arange(16, dtype=np.float32) x = data.reshape(16,) y = data.reshape(4, 4) print("hash_feature(x):", hash_feature(x)) print("hash_feature(y):", hash_feature(y)) assert hash_feature(x) == hash_feature(y)
# 3. Non-tensor list crash. try: hash_feature([1, 2, 3]) except TypeError as e: print("TypeError:", e) ```
Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
6News mentions
0No linked articles in our index yet.