VYPR
Low severity3.6NVD Advisory· Published Jun 3, 2026

CVE-2026-10775

CVE-2026-10775

Description

SGLang's cache handler has hash collision vulnerabilities allowing local denial of service and data corruption.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

SGLang's cache handler has hash collision vulnerabilities allowing local denial of service and data corruption.

Vulnerability

A vulnerability exists in the data_hash function within the Cache Handler component of sgl-project SGLang up to version 0.5.11. This issue stems from hash stability and collision problems in the multimodal embedding cache key path, specifically the truncation of SHA256 to 64 bits in data_hash, and other related hashing functions like tensor_hash and hash_feature which do not properly account for tensor boundaries, shape, or dtype, and the use of unstable Python hash() in combine_hashes. These issues can lead to silent embedding corruption, cache invalidation across processes, or request crashes [2].

Exploitation

An attacker with local execution privileges can exploit this vulnerability by triggering hash collisions within the multimodal embedding cache. This can be achieved by crafting specific multimodal inputs that result in identical cache keys despite representing different data. The exploitation requires a high degree of complexity and is considered difficult, but the exploit has been publicly disclosed [2].

Impact

Successful exploitation can lead to a denial of service by causing request crashes or cache invalidation. Additionally, it can result in silent incorrect outputs due to cache poisoning, where a request receives an embedding computed for a different multimodal input. The scope of the compromise is limited to the local execution environment [2].

Mitigation

A pull request has been submitted to address these issues by extending data_hash to 128 bits, adding shape/dtype metadata to hashing, replacing unstable Python hash() with SHA256, and fixing type errors. The pull request is awaiting acceptance, and no patched version has been released as of the available references [3]. The vulnerability persists in the latest version according to one reference [2].

AI Insight generated on Jun 3, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

2
  • Sgl Project/Sglangreferences2 versions
    (expand)+ 1 more
    • (no CPE)
    • (no CPE)range: <=0.5.11

Patches

1
127b9e3283f7

[BugFix]: Fix DeepSeek V4 HiCache layer count logic (#25477)

https://github.com/sgl-project/sglangZhanghengMay 16, 2026Fixed in 0.5.12via release-tag
3 files changed · +161 144
  • python/sglang/srt/mem_cache/hybrid_cache/hybrid_pool_assembler.py+6 3 modified
    @@ -283,7 +283,8 @@ def build_deepseek_v4_hicache_stack(
         pp_size: int = 1,
         enable_storage_metrics: bool = False,
     ) -> tuple[HostPoolGroup, HybridCacheController]:
    -    transfer_layer_num = len(kvcache.compression_ratios)
    +    # TODO(hzh0425): Support PP for deepseek v4 with hicache
    +    transfer_layer_num = kvcache.end_layer - kvcache.start_layer
         full_layer_mapping = {layer_id: layer_id for layer_id in range(transfer_layer_num)}
         swa_layer_mapping = {
             layer_id: layer_id for layer_id in range(len(kvcache.swa_kv_pool.kv_buffer))
    @@ -293,7 +294,9 @@ def build_deepseek_v4_hicache_stack(
         c128_layer_mapping = {}
         c4_state_global_layers = []
         c128_state_global_layers = []
    -    for layer_id, layer_item in enumerate(kvcache.layer_mapping):
    +    for layer_id, layer_item in enumerate(
    +        kvcache.layer_mapping[kvcache.start_layer : kvcache.end_layer]
    +    ):
             if layer_item.compress_ratio == 4:
                 c4_layer_mapping[layer_id] = layer_item.compress_layer_id
                 c4_state_global_layers.append(layer_id)
    @@ -730,7 +733,7 @@ def attach_hybrid_pool_to_unified_cache(
                                 indices_from_pool=indices_from_pool,
                             )
                         )
    -            transfer_layer_num = len(kvcache.compression_ratios)
    +            transfer_layer_num = kvcache.end_layer - kvcache.start_layer
             elif mamba_stack:
                 full_layer_mapping = dict(kvcache.full_attention_layer_id_mapping)
                 mamba_layer_mapping = dict(params.req_to_token_pool.mamba_map)
    
  • test/registered/radix_cache/test_unified_radix_cache_kl_hicache_nightly.py+0 141 renamed
    @@ -13,162 +13,21 @@
     from urllib.parse import urlparse
     
     import requests
    -from test_unified_radix_cache_kl import UnifiedRadixTreeTestMixin
     
     from sglang.srt.utils import kill_process_tree
     from sglang.test.ci.ci_register import register_cuda_ci
    -from sglang.test.kl_multiturn_utils import (
    -    get_input_ids,
    -    make_mamba_decode_assert,
    -    make_mamba_prefill_assert,
    -)
     from sglang.test.test_utils import (
    -    DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
         DEFAULT_URL_FOR_TEST,
         CustomTestCase,
         popen_launch_server,
     )
     
    -MAMBA_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct-FP8"
    -MAMBA_CHUNK_SIZE = 64
    -MAMBA_TRACK_INTERVAL = 128
    -
    -DSV4_FLASH_MODEL = "sgl-project/DeepSeek-V4-Flash-FP8"
    -DSV4_FLASH_LAUNCH_TIMEOUT = 3600
    -
    -DSV32_MODEL = "deepseek-ai/DeepSeek-V3.2"
    -DSV32_LAUNCH_TIMEOUT = 3600
    -
     GLM5_MODEL = "zai-org/GLM-5.1-FP8"
     GLM5_LAUNCH_TIMEOUT = 3600
     
     register_cuda_ci(est_time=900, suite="nightly-8-gpu-h200", nightly=True)
     
     
    -class TestUnifiedMambaHiCache(UnifiedRadixTreeTestMixin, CustomTestCase):
    -    """Mamba hybrid + HiCache + UnifiedRadixCache."""
    -
    -    kl_threshold = 0.003
    -    prefill_cache_assert = staticmethod(
    -        make_mamba_prefill_assert(chunk_size=MAMBA_CHUNK_SIZE)
    -    )
    -    decode_cache_assert = staticmethod(
    -        make_mamba_decode_assert(track_interval=MAMBA_TRACK_INTERVAL)
    -    )
    -
    -    @classmethod
    -    def setUpClass(cls):
    -        cls.model = MAMBA_MODEL
    -        cls.base_url = DEFAULT_URL_FOR_TEST
    -        cls.process = popen_launch_server(
    -            cls.model,
    -            cls.base_url,
    -            timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
    -            other_args=[
    -                "--tp-size",
    -                "4",
    -                "--chunked-prefill-size",
    -                "2048",
    -                "--mem-fraction-static",
    -                "0.85",
    -                "--mamba-scheduler-strategy",
    -                "extra_buffer",
    -                "--mamba-track-interval",
    -                str(MAMBA_TRACK_INTERVAL),
    -                "--enable-hierarchical-cache",
    -                "--hicache-ratio",
    -                "4",
    -                "--hicache-write-policy",
    -                "write_through",
    -                "--hicache-io-backend",
    -                "direct",
    -                "--hicache-mem-layout",
    -                "page_first_direct",
    -                "--max-total-tokens",
    -                "12000",
    -                "--max-mamba-cache-size",
    -                "500",
    -                "--max-running-requests",
    -                "4",
    -            ],
    -            env={"SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1"},
    -        )
    -        cls.input_ids = get_input_ids(cls.model, num_samples=18)
    -
    -    @classmethod
    -    def tearDownClass(cls):
    -        kill_process_tree(cls.process.pid)
    -
    -
    -def _assert_dsv4_decode_cached_tokens(result, history_len, output_len, label):
    -    expected = history_len + output_len
    -    actual = result["meta_info"]["cached_tokens"]
    -    lower = max(0, expected - 256)
    -    assert actual >= lower, f"{label}: expected cached_tokens>={lower}, got {actual}"
    -
    -
    -class TestUnifiedDeepSeekV4FlashHiCache(UnifiedRadixTreeTestMixin, CustomTestCase):
    -    """DeepSeek V4 Flash FP8 + HiCache + UnifiedRadixCache."""
    -
    -    kl_threshold = 0.0035
    -    sampling_temperature = 0
    -    decode_cache_assert = staticmethod(_assert_dsv4_decode_cached_tokens)
    -    gsm8k_threshold = 0.90
    -    num_gsm8k_questions = 100
    -
    -    @unittest.skip("no stable.")
    -    def test_multiturn_logprobs_match(self):
    -        pass
    -
    -    @classmethod
    -    def setUpClass(cls):
    -        cls.model = DSV4_FLASH_MODEL
    -        cls.base_url = DEFAULT_URL_FOR_TEST
    -        cls.process = popen_launch_server(
    -            cls.model,
    -            cls.base_url,
    -            timeout=DSV4_FLASH_LAUNCH_TIMEOUT,
    -            other_args=[
    -                "--trust-remote-code",
    -                "--tp-size",
    -                "4",
    -                "--attention-backend",
    -                "compressed",
    -                "--page-size",
    -                "256",
    -                "--chunked-prefill-size",
    -                "8192",
    -                "--mem-fraction-static",
    -                "0.9",
    -                "--disable-shared-experts-fusion",
    -                "--enable-hierarchical-cache",
    -                "--hicache-ratio",
    -                "4",
    -                "--hicache-write-policy",
    -                "write_through",
    -                "--hicache-io-backend",
    -                "direct",
    -                "--hicache-mem-layout",
    -                "page_first_direct",
    -                "--swa-full-tokens-ratio",
    -                "0.25",
    -                "--max-total-tokens",
    -                "20000",
    -                "--max-running-requests",
    -                "2",
    -            ],
    -            env={
    -                "SGLANG_DSV4_FP4_EXPERTS": "0",
    -                "SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1",
    -            },
    -        )
    -        cls.input_ids = get_input_ids(cls.model, num_samples=18)
    -
    -    @classmethod
    -    def tearDownClass(cls):
    -        kill_process_tree(cls.process.pid)
    -
    -
     class GSM8KTwoPassMixin:
         """Mixin: run GSM8K twice with flush in between, verify accuracy diff.
     
    
  • test/registered/radix_cache/test_unified_radix_cache_kl_hicache.py+155 0 added
    @@ -0,0 +1,155 @@
    +import unittest
    +
    +from test_unified_radix_cache_kl import UnifiedRadixTreeTestMixin
    +
    +from sglang.srt.utils import kill_process_tree
    +from sglang.test.ci.ci_register import register_cuda_ci
    +from sglang.test.kl_multiturn_utils import (
    +    get_input_ids,
    +    make_mamba_decode_assert,
    +    make_mamba_prefill_assert,
    +)
    +from sglang.test.test_utils import (
    +    DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
    +    DEFAULT_URL_FOR_TEST,
    +    CustomTestCase,
    +    is_in_ci,
    +    popen_launch_server,
    +)
    +
    +MAMBA_MODEL = "Qwen/Qwen3-Next-80B-A3B-Instruct"
    +MAMBA_CHUNK_SIZE = 64
    +MAMBA_TRACK_INTERVAL = 128
    +
    +DSV4_FLASH_MODEL = "sgl-project/DeepSeek-V4-Flash-FP8"
    +DSV4_FLASH_LAUNCH_TIMEOUT = 3600
    +
    +register_cuda_ci(est_time=768, stage="base-c", runner_config="8-gpu-h200")
    +
    +
    +class TestUnifiedMambaHiCache(UnifiedRadixTreeTestMixin, CustomTestCase):
    +    """Mamba hybrid + HiCache + UnifiedRadixCache."""
    +
    +    kl_threshold = 0.005
    +    prefill_cache_assert = staticmethod(
    +        make_mamba_prefill_assert(chunk_size=MAMBA_CHUNK_SIZE)
    +    )
    +    decode_cache_assert = staticmethod(
    +        make_mamba_decode_assert(track_interval=MAMBA_TRACK_INTERVAL)
    +    )
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        cls.model = MAMBA_MODEL
    +        cls.base_url = DEFAULT_URL_FOR_TEST
    +        cls.process = popen_launch_server(
    +            cls.model,
    +            cls.base_url,
    +            timeout=DEFAULT_TIMEOUT_FOR_SERVER_LAUNCH,
    +            other_args=[
    +                "--tp-size",
    +                "4",
    +                "--chunked-prefill-size",
    +                "2048",
    +                "--mem-fraction-static",
    +                "0.85",
    +                "--mamba-scheduler-strategy",
    +                "extra_buffer",
    +                "--mamba-track-interval",
    +                str(MAMBA_TRACK_INTERVAL),
    +                "--enable-hierarchical-cache",
    +                "--hicache-ratio",
    +                "4",
    +                "--hicache-write-policy",
    +                "write_through",
    +                "--hicache-io-backend",
    +                "direct",
    +                "--hicache-mem-layout",
    +                "page_first_direct",
    +                "--max-total-tokens",
    +                "12000",
    +                "--max-mamba-cache-size",
    +                "500",
    +                "--max-running-requests",
    +                "4",
    +            ],
    +            env={"SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1"},
    +        )
    +        cls.input_ids = get_input_ids(cls.model, num_samples=18)
    +
    +    @classmethod
    +    def tearDownClass(cls):
    +        kill_process_tree(cls.process.pid)
    +
    +
    +def _assert_dsv4_decode_cached_tokens(result, history_len, output_len, label):
    +    expected = history_len + output_len
    +    actual = result["meta_info"]["cached_tokens"]
    +    lower = max(0, expected - 256)
    +    assert actual >= lower, f"{label}: expected cached_tokens>={lower}, got {actual}"
    +
    +
    +class TestUnifiedDeepSeekV4FlashHiCache(UnifiedRadixTreeTestMixin, CustomTestCase):
    +    """DeepSeek V4 Flash FP8 + HiCache + UnifiedRadixCache."""
    +
    +    kl_threshold = 0.005
    +    sampling_temperature = 0
    +    decode_cache_assert = staticmethod(_assert_dsv4_decode_cached_tokens)
    +    gsm8k_threshold = 0.90
    +    num_gsm8k_questions = 100
    +
    +    @unittest.skipIf(is_in_ci(), "To reduce the CI execution time.")
    +    def test_multiturn_logprobs_match(self):
    +        pass
    +
    +    @classmethod
    +    def setUpClass(cls):
    +        cls.model = DSV4_FLASH_MODEL
    +        cls.base_url = DEFAULT_URL_FOR_TEST
    +        cls.process = popen_launch_server(
    +            cls.model,
    +            cls.base_url,
    +            timeout=DSV4_FLASH_LAUNCH_TIMEOUT,
    +            other_args=[
    +                "--trust-remote-code",
    +                "--tp-size",
    +                "4",
    +                "--attention-backend",
    +                "compressed",
    +                "--page-size",
    +                "256",
    +                "--chunked-prefill-size",
    +                "8192",
    +                "--mem-fraction-static",
    +                "0.9",
    +                "--disable-shared-experts-fusion",
    +                "--enable-hierarchical-cache",
    +                "--hicache-ratio",
    +                "4",
    +                "--hicache-write-policy",
    +                "write_through",
    +                "--hicache-io-backend",
    +                "direct",
    +                "--hicache-mem-layout",
    +                "page_first_direct",
    +                "--swa-full-tokens-ratio",
    +                "0.25",
    +                "--max-total-tokens",
    +                "20000",
    +                "--max-running-requests",
    +                "2",
    +            ],
    +            env={
    +                "SGLANG_DSV4_FP4_EXPERTS": "0",
    +                "SGLANG_ENABLE_UNIFIED_RADIX_TREE": "1",
    +            },
    +        )
    +        cls.input_ids = get_input_ids(cls.model, num_samples=18)
    +
    +    @classmethod
    +    def tearDownClass(cls):
    +        kill_process_tree(cls.process.pid)
    +
    +
    +if __name__ == "__main__":
    +    unittest.main()
    

Vulnerability mechanics

Root cause

"Hash collisions in multimodal cache keys lead to incorrect data retrieval and potential crashes."

Attack vector

An attacker with local execution privileges can trigger this vulnerability by providing specially crafted multimodal inputs. These inputs exploit weaknesses in how the system hashes tensor data and combines hash values. Specifically, the `data_hash` function truncates SHA256 hashes, and `tensor_hash` does not encode tensor boundaries, leading to collisions. Additionally, the use of Python's built-in `hash()` for combining hashes is not stable across processes. This can result in cache poisoning or silent output corruption, and in some cases, a denial of service due to a `TypeError` [ref_id=1].

Affected code

The vulnerability resides in the multimodal cache handling logic, specifically within the `data_hash`, `tensor_hash`, and `hash_feature` functions in `python/sglang/srt/managers/mm_utils.py`, and `combine_hashes` in `python/sglang/srt/mem_cache/multimodal_cache.py`. The `precomputed_embedding` handling in `python/sglang/srt/multimodal/processors/base_processor.py` is also affected. The fix is located in `python/sglang/srt/mem_cache/hybrid_cache/hybrid_pool_assembler.py` [ref_id=1].

What the fix does

The patch modifies the `build_deepseek_v4_hicache_stack` function to correctly calculate the number of transfer layers. Previously, it incorrectly used the length of `kvcache.compression_ratios`, which did not account for the `start_layer` and `end_layer` parameters. The fix ensures that `transfer_layer_num` accurately reflects the relevant layers for DeepSeek V4 models with HiCache, preventing potential misconfigurations that could lead to issues like cache collisions or denial of service [patch_id=4719042].

Preconditions

  • inputSpecially crafted multimodal inputs that exploit hash stability and collision issues.
  • authLocal execution privileges.

Reproduction

The following minimal script demonstrates the deterministic collision and crash cases without requiring a model server: ```python import hashlib import numpy as np import torch

def data_hash(data) -> int: hash_bytes = hashlib.sha256(data).digest()[:8] return int.from_bytes(hash_bytes, byteorder="big", signed=False)

def flatten_nested_list(nested_list): ret = [] for item in nested_list: if isinstance(item, list): ret.extend(flatten_nested_list(item)) else: ret.append(item) return ret

def tensor_hash(tensor_list) -> int: if isinstance(tensor_list, list): tensor_list = flatten_nested_list(tensor_list) tensor_list = [x.flatten() for x in tensor_list] tensor = torch.concat(tensor_list) else: tensor = tensor_list return data_hash(tensor.numpy().tobytes())

def hash_feature(f): if isinstance(f, list): if isinstance(f[0], torch.Tensor): return tensor_hash(f) return data_hash(tuple(flatten_nested_list(f))) elif isinstance(f, np.ndarray): arr = np.ascontiguousarray(f) return data_hash(arr.tobytes()) elif isinstance(f, torch.Tensor): return tensor_hash([f]) return data_hash(f)

# 1. Tensor boundary loss: same bytes, different tensor partitions. a = [torch.tensor([1.0, 2.0]), torch.tensor([3.0, 4.0, 5.0])] b = [torch.tensor([1.0, 2.0, 3.0]), torch.tensor([4.0, 5.0])] print("tensor_hash(a):", tensor_hash(a)) print("tensor_hash(b):", tensor_hash(b)) assert tensor_hash(a) == tensor_hash(b)

# 2. ndarray shape loss: same bytes, different shapes. data = np.arange(16, dtype=np.float32) x = data.reshape(16,) y = data.reshape(4, 4) print("hash_feature(x):", hash_feature(x)) print("hash_feature(y):", hash_feature(y)) assert hash_feature(x) == hash_feature(y)

# 3. Non-tensor list crash. try: hash_feature([1, 2, 3]) except TypeError as e: print("TypeError:", e) ```

Generated on Jun 3, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

6

News mentions

0

No linked articles in our index yet.