VYPR
Low severity3.6NVD Advisory· Published Jun 4, 2026· Updated Jun 4, 2026

CVE-2026-10804

CVE-2026-10804

Description

Streamlit versions prior to 1.53.0 are vulnerable to cache collisions due to weak hashing, potentially returning stale data.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Streamlit versions prior to 1.53.0 are vulnerable to cache collisions due to weak hashing, potentially returning stale data.

Vulnerability

A vulnerability exists in Streamlit up to version 1.53.0 within the lib/streamlit/runtime/caching/hashing.py file, specifically affecting the Palette Handler component. The issue stems from a deterministic sampling seed used when hashing large data structures (Pandas, Polars, NumPy) and a lack of palette hashing for PIL P-mode images. This allows for hash collisions where structurally different inputs produce the same cache key [2].

Exploitation

An attacker with local access can exploit this vulnerability by crafting two different inputs that result in the same hash. This is achieved by modifying non-sampled positions in large data structures or by creating PIL P-mode images with identical pixel indices but different palettes. The fixed sampling seed and omission of palette hashing enable predictable hash generation, leading to cache poisoning [2].

Impact

Successful exploitation of this vulnerability can lead to Streamlit applications returning stale or incorrect cached data without any error. This occurs because the weak hashing mechanism allows for cache collisions, where the application incorrectly retrieves and displays data associated with a different input [2].

Mitigation

This vulnerability is addressed in Streamlit version 1.53.0. A pull request to fix the issue has been submitted and is awaiting acceptance [1]. Users are advised to update to a patched version once available. No specific workarounds are detailed in the provided references.

AI Insight generated on Jun 4, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.

Affected products

2

Patches

2
28c98f3648ce

Merge 915f6573028db1d0d6748c9f95421c01ca0dfa95 into 4d2a62a151466f7dc2fb9d076092526c6b2aeec6

https://github.com/streamlit/streamlitMattJun 4, 2026via nvd-ref
2 files changed · +136 10
  • lib/streamlit/runtime/caching/hashing.py+88 10 modified
    @@ -66,6 +66,66 @@
     )
     
     
    +def _pandas_sample_seed(obj: Any) -> int:
    +    """Return a data-dependent seed for pandas sampling, or 0 if unhashable.
    +
    +    Using 0 matches the legacy fixed seed when ``hash_pandas_object`` cannot run
    +    (e.g. unhashable cell values), so the pickle fallback path is unchanged.
    +    """
    +
    +    from pandas.util import hash_pandas_object
    +
    +    try:
    +        hashes = hash_pandas_object(obj)
    +        return int(hashes.sum()) & 0xFFFF_FFFF
    +    except (TypeError, ValueError):
    +        return 0
    +
    +
    +def _numpy_sample_seed(np_obj: Any) -> int:
    +    """Return a data-dependent seed for large-array sampling from a short prefix.
    +
    +    The goal is to avoid a globally fixed sample index set, not cryptographic
    +    unpredictability.
    +    """
    +
    +    import numpy as np
    +
    +    flat = np_obj.flat
    +    n = min(64, int(np_obj.size))
    +    if n == 0:
    +        return 0
    +    try:
    +        prefix = np.asarray(flat[:n], dtype=np_obj.dtype, order="C")
    +        digest = hashlib.md5(prefix.tobytes(), usedforsecurity=False).digest()
    +    except (TypeError, ValueError, BufferError):
    +        return 0
    +    return int.from_bytes(digest[:4], "little") & 0xFFFF_FFFF
    +
    +
    +def _polars_sample_seed(obj: Any) -> int:
    +    """Return a data-dependent seed for Polars sampling, or 0 on failure.
    +
    +    Uses Polars-native hashing (``hash_rows`` for DataFrames, ``hash`` for
    +    Series) to avoid non-determinism that can occur when converting exotic
    +    dtypes through NumPy's ``tobytes``.
    +    """
    +
    +    try:
    +        head = obj.head(1)
    +        # DataFrames expose hash_rows; Series expose hash.
    +        if hasattr(head, "hash_rows"):
    +            hash_series = head.hash_rows(seed=0)
    +        else:
    +            hash_series = head.hash(seed=0)
    +        digest = hashlib.md5(
    +            hash_series.to_numpy().tobytes(), usedforsecurity=False
    +        ).digest()
    +    except (TypeError, ValueError, BufferError):
    +        return 0
    +    return int.from_bytes(digest[:4], "little") & 0xFFFF_FFFF
    +
    +
     class UserHashError(StreamlitAPIException):
         def __init__(
             self,
    @@ -427,7 +487,9 @@ def _to_bytes(self, obj: Any) -> bytes:
                 self.update(h, series_obj.dtype.name)
     
                 if len(series_obj) >= _PANDAS_ROWS_LARGE:
    -                series_obj = series_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=0)
    +                # Data-dependent seed so sample indices are not globally fixed.
    +                rs = _pandas_sample_seed(series_obj.iloc[:1])
    +                series_obj = series_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=rs)
     
                 try:
                     self.update(h, hash_pandas_object(series_obj).to_numpy().tobytes())
    @@ -449,7 +511,8 @@ def _to_bytes(self, obj: Any) -> bytes:
                 self.update(h, df_obj.shape)
     
                 if len(df_obj) >= _PANDAS_ROWS_LARGE:
    -                df_obj = df_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=0)
    +                sample_seed = _pandas_sample_seed(df_obj.iloc[:1])
    +                df_obj = df_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=sample_seed)
     
                 try:
                     column_hash_bytes = self.to_bytes(hash_pandas_object(df_obj.dtypes))
    @@ -469,17 +532,19 @@ def _to_bytes(self, obj: Any) -> bytes:
                     return b"%s" % pickle.dumps(df_obj, pickle.HIGHEST_PROTOCOL)
     
             elif type_util.is_type(obj, "polars.series.series.Series"):
    -            import polars as pl
    -
                 obj = cast("pl.Series", obj)
                 self.update(h, str(obj.dtype).encode())
                 self.update(h, obj.shape)
     
    +            sample_seed = 0
                 if len(obj) >= _PANDAS_ROWS_LARGE:
    -                obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=0)
    +                sample_seed = _polars_sample_seed(obj)
    +                obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=sample_seed)
     
                 try:
    -                self.update(h, obj.hash(seed=0).to_arrow().to_string().encode())
    +                self.update(
    +                    h, obj.hash(seed=sample_seed).to_arrow().to_string().encode()
    +                )
                     return h.digest()
     
                 except TypeError:
    @@ -498,15 +563,21 @@ def _to_bytes(self, obj: Any) -> bytes:
                 obj = cast("pl.DataFrame", obj)
                 self.update(h, obj.shape)
     
    +            sample_seed = 0
                 if len(obj) >= _PANDAS_ROWS_LARGE:
    -                obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=0)
    +                sample_seed = _polars_sample_seed(obj)
    +                obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=sample_seed)
                 try:
                     for c, t in obj.schema.items():
                         self.update(h, c.encode())
                         self.update(h, str(t).encode())
     
                     values_hash_bytes = (
    -                    obj.hash_rows(seed=0).hash(seed=0).to_arrow().to_string().encode()
    +                    obj.hash_rows(seed=sample_seed)
    +                    .hash(seed=sample_seed)
    +                    .to_arrow()
    +                    .to_string()
    +                    .encode()
                     )
     
                     self.update(h, values_hash_bytes)
    @@ -530,7 +601,9 @@ def _to_bytes(self, obj: Any) -> bytes:
                 if np_obj.size >= _NP_SIZE_LARGE:
                     import numpy as np
     
    -                state = np.random.RandomState(0)
    +                # Data-dependent seed so sample indices are not globally fixed.
    +                rs = _numpy_sample_seed(np_obj)
    +                state = np.random.RandomState(rs)
                     np_obj = state.choice(np_obj.flat, size=_NP_SAMPLE_SIZE)
     
                 self.update(h, np_obj.tobytes())
    @@ -543,7 +616,12 @@ def _to_bytes(self, obj: Any) -> bytes:
     
                 # we don't just hash the results of obj.tobytes() because we want to use
                 # the sampling logic for numpy data
    -            np_array = np.frombuffer(pil_obj.tobytes(), dtype="uint8")
    +            pixel_bytes = pil_obj.tobytes()
    +            if pil_obj.mode == "P":
    +                palette_data = pil_obj.getpalette()
    +                if palette_data is not None:
    +                    pixel_bytes = bytes(palette_data) + pixel_bytes
    +            np_array = np.frombuffer(pixel_bytes, dtype="uint8")
                 return self.to_bytes(np_array)
     
             elif inspect.isbuiltin(obj):
    
  • lib/tests/streamlit/runtime/caching/hashing_test.py+48 0 modified
    @@ -989,3 +989,51 @@ class Model(pydantic.BaseModel):
             with pytest.raises(UnhashableTypeError) as exc_info:
                 get_hash(instance)
         assert "unhashable members" in str(exc_info.value).lower()
    +
    +
    +def test_PIL_pmode_palette_collision_prevention() -> None:
    +    """P-mode images with identical index buffers but different palettes must not collide."""
    +
    +    img_a = Image.new("P", (100, 100), 0)
    +    img_b = Image.new("P", (100, 100), 0)
    +    palette_a = [0] * 768
    +    palette_b = [0] * 768
    +    palette_a[0:3] = [0, 0, 0]
    +    palette_b[0:3] = [255, 0, 0]
    +    img_a.putpalette(palette_a)
    +    img_b.putpalette(palette_b)
    +
    +    assert img_a.tobytes() == img_b.tobytes()
    +    assert get_hash(img_a) != get_hash(img_b)
    +
    +
    +def test_numpy_large_array_seed_prefix_change_differs() -> None:
    +    """Mutating the seed prefix (first element) produces a different hash for large arrays."""
    +
    +    total = _NP_SIZE_LARGE + 10_000
    +    arr_a = np.zeros(total, dtype=np.float64)
    +    arr_b = arr_a.copy()
    +    arr_b[0] = 255.0
    +
    +    assert get_hash(arr_a) != get_hash(arr_b)
    +
    +
    +def test_pandas_large_dataframe_seed_row_change_differs() -> None:
    +    """Mutating the seed row (row 0) produces a different hash for large DataFrames."""
    +
    +    n = _PANDAS_ROWS_LARGE + 5_000
    +    df_a = pd.DataFrame({"val": np.zeros(n)})
    +    df_b = df_a.copy()
    +    df_b.iloc[0, df_b.columns.get_loc("val")] = 99.0
    +
    +    assert get_hash(df_a) != get_hash(df_b)
    +
    +
    +def test_pandas_large_dataframe_unhashable_payload_uses_pickle_fallback() -> None:
    +    """Large frames with unhashable cells must still hash and match when identical."""
    +
    +    n = _PANDAS_ROWS_LARGE
    +    df1 = pd.DataFrame({"x": [[1]] * n})
    +    df2 = pd.DataFrame({"x": [[1]] * n})
    +
    +    assert get_hash(df1) == get_hash(df2)
    
0629d95e8e04

Fix linting issues

https://github.com/streamlit/streamlitKen McGradyJan 22, 2026Fixed in 1.53.1via release-tag
4 files changed · +7 7
  • e2e_playwright/st_multiselect.py+1 1 modified
    @@ -222,7 +222,7 @@ def on_change():
     # Test for issue #13646: Custom class objects without __eq__ should work with format_func
     # This tests that selections are preserved for custom class objects after script reruns
     # when the widget uses a format_func to display the options.
    -class CustomOption:  # noqa: B903
    +class CustomOption:
         """Custom class without __eq__ implementation.
     
         This simulates the common pattern where users have custom objects with a
    
  • lib/tests/streamlit/elements/lib/options_selector_utils_test.py+4 4 modified
    @@ -518,7 +518,7 @@ def test_custom_objects_without_eq_using_format_func(self):
             from copy import deepcopy
     
             # Custom class without __eq__ implementation
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    @@ -619,7 +619,7 @@ def test_custom_objects_without_eq_using_format_func(self):
             """Test that custom objects without __eq__ work with format_func validation."""
     
             # Custom class without __eq__ implementation
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    @@ -649,7 +649,7 @@ def __init__(self, value: str):
         def test_custom_objects_partial_match_with_format_func(self):
             """Test that only matching custom objects are kept."""
     
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    @@ -708,7 +708,7 @@ def test_format_func_failure_filters_out_value(self):
             but the format_func can't handle strings (e.g., lambda x: x.attribute).
             """
     
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    
  • lib/tests/streamlit/elements/multiselect_test.py+1 1 modified
    @@ -710,7 +710,7 @@ def test_serialize_deepcopied_custom_objects(self):
             from copy import deepcopy
     
             # Custom class without __eq__ implementation
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    
  • lib/tests/streamlit/elements/selectbox_test.py+1 1 modified
    @@ -825,7 +825,7 @@ def test_serialize_deepcopied_custom_objects(self):
             from copy import deepcopy
     
             # Custom class without __eq__ implementation
    -        class MyOption:  # noqa: B903
    +        class MyOption:
                 def __init__(self, value: str):
                     self.value = value
     
    

Vulnerability mechanics

Root cause

"Deterministic sampling seeds and omission of palette bytes in hashing lead to cache collisions."

Attack vector

An attacker with local access can craft two different inputs that produce the same hash by exploiting fixed sampling seeds for large objects or by manipulating palette-indexed PIL images. For large objects, the attacker modifies non-sampled positions, which are predictable due to the fixed seed. For PIL images, the attacker provides images with identical pixel indices but different palettes. This manipulation leads to the use of weak hashes, causing the cache to return stale or incorrect data without errors [ref_id=2]. The attack requires a high level of complexity and is considered difficult to exploit.

Affected code

The vulnerability resides in the `lib/streamlit/runtime/caching/hashing.py` file within Streamlit's caching mechanism. Specifically, the `_to_bytes()` function and the large-object sampling branches for Pandas, Polars, and NumPy arrays are affected by the deterministic sampling seed issue. The PIL Image.Image branch within `_to_bytes()` is impacted by the omission of palette bytes [ref_id=2].

What the fix does

The fix replaces globally fixed sampling seeds with data-dependent seeds derived from a prefix of the input data. This ensures that sampled indices are content-dependent, preventing predictable collisions. Additionally, for PIL P-mode images, palette bytes are now prepended to the data before hashing. This ensures that images with identical pixel indices but different palettes produce distinct cache keys, resolving the hash collision vulnerability [ref_id=1]. The patch is available at [patch_id=4792408].

Preconditions

  • inputLocal access is required to approach this attack.

Generated on Jun 4, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

6

News mentions

0

No linked articles in our index yet.