CVE-2026-10804
Description
Streamlit versions prior to 1.53.0 are vulnerable to cache collisions due to weak hashing, potentially returning stale data.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
Streamlit versions prior to 1.53.0 are vulnerable to cache collisions due to weak hashing, potentially returning stale data.
Vulnerability
A vulnerability exists in Streamlit up to version 1.53.0 within the lib/streamlit/runtime/caching/hashing.py file, specifically affecting the Palette Handler component. The issue stems from a deterministic sampling seed used when hashing large data structures (Pandas, Polars, NumPy) and a lack of palette hashing for PIL P-mode images. This allows for hash collisions where structurally different inputs produce the same cache key [2].
Exploitation
An attacker with local access can exploit this vulnerability by crafting two different inputs that result in the same hash. This is achieved by modifying non-sampled positions in large data structures or by creating PIL P-mode images with identical pixel indices but different palettes. The fixed sampling seed and omission of palette hashing enable predictable hash generation, leading to cache poisoning [2].
Impact
Successful exploitation of this vulnerability can lead to Streamlit applications returning stale or incorrect cached data without any error. This occurs because the weak hashing mechanism allows for cache collisions, where the application incorrectly retrieves and displays data associated with a different input [2].
Mitigation
This vulnerability is addressed in Streamlit version 1.53.0. A pull request to fix the issue has been submitted and is awaiting acceptance [1]. Users are advised to update to a patched version once available. No specific workarounds are detailed in the provided references.
AI Insight generated on Jun 4, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
2(expand)+ 1 more
- (no CPE)
- (no CPE)range: <=1.53.0
Patches
228c98f3648ceMerge 915f6573028db1d0d6748c9f95421c01ca0dfa95 into 4d2a62a151466f7dc2fb9d076092526c6b2aeec6
2 files changed · +136 −10
lib/streamlit/runtime/caching/hashing.py+88 −10 modified@@ -66,6 +66,66 @@ ) +def _pandas_sample_seed(obj: Any) -> int: + """Return a data-dependent seed for pandas sampling, or 0 if unhashable. + + Using 0 matches the legacy fixed seed when ``hash_pandas_object`` cannot run + (e.g. unhashable cell values), so the pickle fallback path is unchanged. + """ + + from pandas.util import hash_pandas_object + + try: + hashes = hash_pandas_object(obj) + return int(hashes.sum()) & 0xFFFF_FFFF + except (TypeError, ValueError): + return 0 + + +def _numpy_sample_seed(np_obj: Any) -> int: + """Return a data-dependent seed for large-array sampling from a short prefix. + + The goal is to avoid a globally fixed sample index set, not cryptographic + unpredictability. + """ + + import numpy as np + + flat = np_obj.flat + n = min(64, int(np_obj.size)) + if n == 0: + return 0 + try: + prefix = np.asarray(flat[:n], dtype=np_obj.dtype, order="C") + digest = hashlib.md5(prefix.tobytes(), usedforsecurity=False).digest() + except (TypeError, ValueError, BufferError): + return 0 + return int.from_bytes(digest[:4], "little") & 0xFFFF_FFFF + + +def _polars_sample_seed(obj: Any) -> int: + """Return a data-dependent seed for Polars sampling, or 0 on failure. + + Uses Polars-native hashing (``hash_rows`` for DataFrames, ``hash`` for + Series) to avoid non-determinism that can occur when converting exotic + dtypes through NumPy's ``tobytes``. + """ + + try: + head = obj.head(1) + # DataFrames expose hash_rows; Series expose hash. + if hasattr(head, "hash_rows"): + hash_series = head.hash_rows(seed=0) + else: + hash_series = head.hash(seed=0) + digest = hashlib.md5( + hash_series.to_numpy().tobytes(), usedforsecurity=False + ).digest() + except (TypeError, ValueError, BufferError): + return 0 + return int.from_bytes(digest[:4], "little") & 0xFFFF_FFFF + + class UserHashError(StreamlitAPIException): def __init__( self, @@ -427,7 +487,9 @@ def _to_bytes(self, obj: Any) -> bytes: self.update(h, series_obj.dtype.name) if len(series_obj) >= _PANDAS_ROWS_LARGE: - series_obj = series_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=0) + # Data-dependent seed so sample indices are not globally fixed. + rs = _pandas_sample_seed(series_obj.iloc[:1]) + series_obj = series_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=rs) try: self.update(h, hash_pandas_object(series_obj).to_numpy().tobytes()) @@ -449,7 +511,8 @@ def _to_bytes(self, obj: Any) -> bytes: self.update(h, df_obj.shape) if len(df_obj) >= _PANDAS_ROWS_LARGE: - df_obj = df_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=0) + sample_seed = _pandas_sample_seed(df_obj.iloc[:1]) + df_obj = df_obj.sample(n=_PANDAS_SAMPLE_SIZE, random_state=sample_seed) try: column_hash_bytes = self.to_bytes(hash_pandas_object(df_obj.dtypes)) @@ -469,17 +532,19 @@ def _to_bytes(self, obj: Any) -> bytes: return b"%s" % pickle.dumps(df_obj, pickle.HIGHEST_PROTOCOL) elif type_util.is_type(obj, "polars.series.series.Series"): - import polars as pl - obj = cast("pl.Series", obj) self.update(h, str(obj.dtype).encode()) self.update(h, obj.shape) + sample_seed = 0 if len(obj) >= _PANDAS_ROWS_LARGE: - obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=0) + sample_seed = _polars_sample_seed(obj) + obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=sample_seed) try: - self.update(h, obj.hash(seed=0).to_arrow().to_string().encode()) + self.update( + h, obj.hash(seed=sample_seed).to_arrow().to_string().encode() + ) return h.digest() except TypeError: @@ -498,15 +563,21 @@ def _to_bytes(self, obj: Any) -> bytes: obj = cast("pl.DataFrame", obj) self.update(h, obj.shape) + sample_seed = 0 if len(obj) >= _PANDAS_ROWS_LARGE: - obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=0) + sample_seed = _polars_sample_seed(obj) + obj = obj.sample(n=_PANDAS_SAMPLE_SIZE, seed=sample_seed) try: for c, t in obj.schema.items(): self.update(h, c.encode()) self.update(h, str(t).encode()) values_hash_bytes = ( - obj.hash_rows(seed=0).hash(seed=0).to_arrow().to_string().encode() + obj.hash_rows(seed=sample_seed) + .hash(seed=sample_seed) + .to_arrow() + .to_string() + .encode() ) self.update(h, values_hash_bytes) @@ -530,7 +601,9 @@ def _to_bytes(self, obj: Any) -> bytes: if np_obj.size >= _NP_SIZE_LARGE: import numpy as np - state = np.random.RandomState(0) + # Data-dependent seed so sample indices are not globally fixed. + rs = _numpy_sample_seed(np_obj) + state = np.random.RandomState(rs) np_obj = state.choice(np_obj.flat, size=_NP_SAMPLE_SIZE) self.update(h, np_obj.tobytes()) @@ -543,7 +616,12 @@ def _to_bytes(self, obj: Any) -> bytes: # we don't just hash the results of obj.tobytes() because we want to use # the sampling logic for numpy data - np_array = np.frombuffer(pil_obj.tobytes(), dtype="uint8") + pixel_bytes = pil_obj.tobytes() + if pil_obj.mode == "P": + palette_data = pil_obj.getpalette() + if palette_data is not None: + pixel_bytes = bytes(palette_data) + pixel_bytes + np_array = np.frombuffer(pixel_bytes, dtype="uint8") return self.to_bytes(np_array) elif inspect.isbuiltin(obj):
lib/tests/streamlit/runtime/caching/hashing_test.py+48 −0 modified@@ -989,3 +989,51 @@ class Model(pydantic.BaseModel): with pytest.raises(UnhashableTypeError) as exc_info: get_hash(instance) assert "unhashable members" in str(exc_info.value).lower() + + +def test_PIL_pmode_palette_collision_prevention() -> None: + """P-mode images with identical index buffers but different palettes must not collide.""" + + img_a = Image.new("P", (100, 100), 0) + img_b = Image.new("P", (100, 100), 0) + palette_a = [0] * 768 + palette_b = [0] * 768 + palette_a[0:3] = [0, 0, 0] + palette_b[0:3] = [255, 0, 0] + img_a.putpalette(palette_a) + img_b.putpalette(palette_b) + + assert img_a.tobytes() == img_b.tobytes() + assert get_hash(img_a) != get_hash(img_b) + + +def test_numpy_large_array_seed_prefix_change_differs() -> None: + """Mutating the seed prefix (first element) produces a different hash for large arrays.""" + + total = _NP_SIZE_LARGE + 10_000 + arr_a = np.zeros(total, dtype=np.float64) + arr_b = arr_a.copy() + arr_b[0] = 255.0 + + assert get_hash(arr_a) != get_hash(arr_b) + + +def test_pandas_large_dataframe_seed_row_change_differs() -> None: + """Mutating the seed row (row 0) produces a different hash for large DataFrames.""" + + n = _PANDAS_ROWS_LARGE + 5_000 + df_a = pd.DataFrame({"val": np.zeros(n)}) + df_b = df_a.copy() + df_b.iloc[0, df_b.columns.get_loc("val")] = 99.0 + + assert get_hash(df_a) != get_hash(df_b) + + +def test_pandas_large_dataframe_unhashable_payload_uses_pickle_fallback() -> None: + """Large frames with unhashable cells must still hash and match when identical.""" + + n = _PANDAS_ROWS_LARGE + df1 = pd.DataFrame({"x": [[1]] * n}) + df2 = pd.DataFrame({"x": [[1]] * n}) + + assert get_hash(df1) == get_hash(df2)
0629d95e8e04Fix linting issues
4 files changed · +7 −7
e2e_playwright/st_multiselect.py+1 −1 modified@@ -222,7 +222,7 @@ def on_change(): # Test for issue #13646: Custom class objects without __eq__ should work with format_func # This tests that selections are preserved for custom class objects after script reruns # when the widget uses a format_func to display the options. -class CustomOption: # noqa: B903 +class CustomOption: """Custom class without __eq__ implementation. This simulates the common pattern where users have custom objects with a
lib/tests/streamlit/elements/lib/options_selector_utils_test.py+4 −4 modified@@ -518,7 +518,7 @@ def test_custom_objects_without_eq_using_format_func(self): from copy import deepcopy # Custom class without __eq__ implementation - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value @@ -619,7 +619,7 @@ def test_custom_objects_without_eq_using_format_func(self): """Test that custom objects without __eq__ work with format_func validation.""" # Custom class without __eq__ implementation - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value @@ -649,7 +649,7 @@ def __init__(self, value: str): def test_custom_objects_partial_match_with_format_func(self): """Test that only matching custom objects are kept.""" - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value @@ -708,7 +708,7 @@ def test_format_func_failure_filters_out_value(self): but the format_func can't handle strings (e.g., lambda x: x.attribute). """ - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value
lib/tests/streamlit/elements/multiselect_test.py+1 −1 modified@@ -710,7 +710,7 @@ def test_serialize_deepcopied_custom_objects(self): from copy import deepcopy # Custom class without __eq__ implementation - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value
lib/tests/streamlit/elements/selectbox_test.py+1 −1 modified@@ -825,7 +825,7 @@ def test_serialize_deepcopied_custom_objects(self): from copy import deepcopy # Custom class without __eq__ implementation - class MyOption: # noqa: B903 + class MyOption: def __init__(self, value: str): self.value = value
Vulnerability mechanics
Root cause
"Deterministic sampling seeds and omission of palette bytes in hashing lead to cache collisions."
Attack vector
An attacker with local access can craft two different inputs that produce the same hash by exploiting fixed sampling seeds for large objects or by manipulating palette-indexed PIL images. For large objects, the attacker modifies non-sampled positions, which are predictable due to the fixed seed. For PIL images, the attacker provides images with identical pixel indices but different palettes. This manipulation leads to the use of weak hashes, causing the cache to return stale or incorrect data without errors [ref_id=2]. The attack requires a high level of complexity and is considered difficult to exploit.
Affected code
The vulnerability resides in the `lib/streamlit/runtime/caching/hashing.py` file within Streamlit's caching mechanism. Specifically, the `_to_bytes()` function and the large-object sampling branches for Pandas, Polars, and NumPy arrays are affected by the deterministic sampling seed issue. The PIL Image.Image branch within `_to_bytes()` is impacted by the omission of palette bytes [ref_id=2].
What the fix does
The fix replaces globally fixed sampling seeds with data-dependent seeds derived from a prefix of the input data. This ensures that sampled indices are content-dependent, preventing predictable collisions. Additionally, for PIL P-mode images, palette bytes are now prepended to the data before hashing. This ensures that images with identical pixel indices but different palettes produce distinct cache keys, resolving the hash collision vulnerability [ref_id=1]. The patch is available at [patch_id=4792408].
Preconditions
- inputLocal access is required to approach this attack.
Generated on Jun 4, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
6News mentions
0No linked articles in our index yet.