CVE-2026-47117
Description
OpenMed versions prior to 1.5.2 are vulnerable to RCE due to improper validation of PII privacy-filter model names, allowing custom code execution.
AI Insight
LLM-synthesized narrative grounded in this CVE's description and references.
OpenMed versions prior to 1.5.2 are vulnerable to RCE due to improper validation of PII privacy-filter model names, allowing custom code execution.
Vulnerability
OpenMed versions before 1.5.2 contain a remote code execution vulnerability within the PII privacy-filter model loading path. The privacy-filter dispatcher utilized broad substring matching on the user-supplied model_name parameter. This allowed an attacker to provide a value such as attacker/foo-privacy-filter-bar, which would be routed through a path that loads Hugging Face models with trust_remote_code=True [4].
Exploitation
An unauthenticated attacker can supply a malicious model repository containing custom Transformers code. This code can be embedded via auto_map in config.json or tokenizer_config.json. The custom code is then imported and executed with the privileges of the OpenMed service process [4]. The vulnerability stems from the privacy-filter dispatcher previously identifying privacy-filter models with a substring match, and the PyTorch privacy-filter wrapper defaulting trust_remote_code=True [2].
Impact
Successful exploitation allows an unauthenticated attacker to execute arbitrary code with the privileges of the OpenMed service process. This can lead to a full compromise of the service and any sensitive data it processes [4].
Mitigation
OpenMed version 1.5.2 addresses this vulnerability by hardening the privacy-filter loading path, tightening model-name routing, and defaulting trust_remote_code to False [3]. The fixed version was released on 2026-06-02 [3]. An explicit allowlist for first-party privacy filter repositories has been added, and custom/private fine-tunes can be allowlisted via the OPENMED_TRUSTED_REMOTE_CODE_MODELS environment variable [2, 3].
AI Insight generated on Jun 2, 2026. Synthesized from this CVE's description and the cited reference URLs; citations are validated against the source bundle.
Affected products
1Patches
198724f65df98Merge pull request #59 from maziyarpanahi/security/model-allowlist
14 files changed · +486 −34
CHANGELOG.md+21 −0 modified@@ -7,6 +7,27 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 ## [Unreleased] +## [1.5.2] - 2026-05-27 + +### Security + +- Hardened the privacy-filter dispatcher to refuse `trust_remote_code=True` for model identifiers outside an explicit allowlist of first-party OpenAI/OpenMed privacy-filter family models (`openai/privacy-filter`, `OpenMed/privacy-filter-multilingual`, `OpenMed/privacy-filter-nemotron`). Previously, any HuggingFace repository whose name contained the substring `privacy-filter` would be loaded with custom-code execution enabled, allowing remote code execution by anyone able to control the `model_name` parameter on `/pii/extract` or `/pii/deidentify`. Operators with custom fine-tunes of the privacy-filter family can extend the allowlist via the `OPENMED_TRUSTED_REMOTE_CODE_MODELS` environment variable (comma-separated repo IDs). +- Changed `PrivacyFilterTorchPipeline`'s `trust_remote_code` default from `True` to `False`. The first-party dispatcher (`openmed.core.backends.create_privacy_filter_pipeline`) opts in explicitly only for allowlisted models. + +### Changed + +- README, docs, and website version surfaces now point at `1.5.2`. + +### Fixed + +- Fixed raw HuggingFace-to-MLX conversion for the OpenAI Privacy Filter family (`openai/privacy-filter`, `OpenMed/privacy-filter-nemotron`, and `OpenMed/privacy-filter-multilingual`) by casting BF16 tensors to float32 before NumPy conversion, remapping OPF/Nemotron checkpoints into the OpenMed MLX runtime layout, fusing Q/K/V projections, preserving classifier bias, and validating converted weight keys/shapes before artifact save. + +### Tests + +- Added `tests/unit/test_privacy_filter_security.py` covering the identifier matcher, allowlist gate, env-var override, local-artifact trust, and dispatcher opt-in. +- Added HTTP-level regression tests in `tests/unit/service/test_api.py` that POST the attacker-controlled `model_name` payload to `/pii/extract` and `/pii/deidentify` and verify the privacy-filter dispatcher is never reached. +- Added MLX converter regressions for BF16 NumPy conversion, OPF weight remapping, QKV fusion order, and partial-QKV rejection. + ## [1.5.1] - 2026-05-21 ### Changed
docs/examples.md+1 −1 modified@@ -24,7 +24,7 @@ Run them with VS Code, Jupyter, or Google Colab—each relies on the same `uv pi ## Apple Silicon & Swift recipes -OpenMed `1.5.1` includes release-critical Apple entry points: +OpenMed `1.5.2` includes release-critical Apple entry points: - [MLX Backend](./mlx-backend.md) for Python on Apple Silicon Macs, including Privacy Filter, OpenMed Multilingual Privacy Filter, and experimental GLiNER-family artifacts - [OpenMedKit (Swift Package)](./swift-openmedkit.md) for macOS, iOS, and iPadOS apps
docs/index.md+2 −2 modified@@ -4,7 +4,7 @@ OpenMed bundles curated biomedical models, advanced extraction utilities, and on clinical NLP workflows without wrangling infrastructure. This documentation keeps the most copied snippets and workflows close at hand—each section is Markdown-first, searchable, and optimized for quick scanning or copy/paste into notebooks. -OpenMed `1.5.1` expands multilingual PII and the Apple story: +OpenMed `1.5.2` expands multilingual PII and the Apple story: - **Python MLX** on Apple Silicon Macs through `openmed[mlx]` - **OpenMedKit** for native macOS, iOS, and iPadOS apps @@ -50,7 +50,7 @@ uv run python examples/pii_model_comparison.py The rest of the docs expand on this snippet—head to **Quick Start** for the end-to-end setup, then explore the guides for configuration, zero-shot GLiNER workflows, and advanced processing helpers. -## 1.5.1 release highlights +## 1.5.2 release highlights - [MLX Backend](./mlx-backend.md) – Python MLX on Apple Silicon, Privacy Filter family support, 28 new Arabic/Japanese/Turkish PII MLX artifacts, shared artifact packaging, and backend auto-detection. - [OpenMedKit (Swift Package)](./swift-openmedkit.md) – native macOS/iOS/iPadOS integration with MLX, CoreML, Privacy Filter, OpenMed Multilingual Privacy Filter, and experimental GLiNER-family APIs.
docs/mlx-backend.md+1 −1 modified@@ -1,6 +1,6 @@ # MLX Backend (Apple Silicon) -OpenMed v1.5.1 expands native Apple Silicon acceleration via [Apple MLX](https://github.com/ml-explore/mlx), including preconverted Arabic, Japanese, and Turkish PII token-classification artifacts. +OpenMed v1.5.2 expands native Apple Silicon acceleration via [Apple MLX](https://github.com/ml-explore/mlx), including preconverted Arabic, Japanese, and Turkish PII token-classification artifacts. That MLX story now has two surfaces:
docs/swift-openmedkit.md+2 −2 modified@@ -2,7 +2,7 @@ OpenMedKit is the Swift package for running OpenMed models in **macOS**, **iOS**, and **iPadOS** apps. -As of `1.5.1`, OpenMedKit supports two Apple backends: +As of `1.5.2`, OpenMedKit supports two Apple backends: - **MLX** for Apple Silicon Macs and real iPhone/iPad devices - **CoreML** for bundled Apple model packages @@ -55,7 +55,7 @@ iOS Simulator is **not** a Swift MLX validation target. ```swift dependencies: [ - .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.5.1"), + .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.5.2"), ] ```
docs/website/index.html+7 −7 modified@@ -6,7 +6,7 @@ <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>OpenMed — Clinical AI that never leaves the device</title> <meta name="description" - content="OpenMed 1.5.1 brings the OpenMed Multilingual Privacy Filter, Arabic/Japanese/Turkish PII MLX artifacts, unified PyTorch/MLX privacy routing, Faker-backed anonymization, 1,000+ open-source healthcare LLM models, and Swift-native OpenMedKit, Apache-2.0 end to end." /> + content="OpenMed 1.5.2 brings the OpenMed Multilingual Privacy Filter, Arabic/Japanese/Turkish PII MLX artifacts, unified PyTorch/MLX privacy routing, Faker-backed anonymization, 1,000+ open-source healthcare LLM models, and Swift-native OpenMedKit, Apache-2.0 end to end." /> <meta name="keywords" content="OpenMed, state-of-the-art LLMs for healthcare, healthcare AI, clinical AI models, biomedical NER, zero-shot medical AI, clinical LLMs, medical language models, healthcare machine learning, clinical LLM workflows, medical AI, biomedical AI, SOTA healthcare models, Maziyar Panahi, Hugging Face, Apache-2.0, SageMaker, clinical decision support, medical text analysis" /> <meta name="author" content="Maziyar Panahi" /> @@ -16,7 +16,7 @@ <meta property="og:type" content="website" /> <meta property="og:title" content="OpenMed — Clinical AI that never leaves the device" /> <meta property="og:description" - content="OpenMed 1.5.1 — OpenMed Multilingual Privacy Filter, Nemotron Privacy Filter MLX, Arabic/Japanese/Turkish PII, unified PyTorch/MLX routing, 1,000+ healthcare LLM models, and one local-first runtime across Python and Swift." /> + content="OpenMed 1.5.2 — OpenMed Multilingual Privacy Filter, Nemotron Privacy Filter MLX, Arabic/Japanese/Turkish PII, unified PyTorch/MLX routing, 1,000+ healthcare LLM models, and one local-first runtime across Python and Swift." /> <meta property="og:url" content="https://openmed.life/" /> <meta property="og:site_name" content="OpenMed" /> <meta property="og:image" content="https://openmed.life/og.png?v=20260425" /> @@ -30,7 +30,7 @@ <meta name="twitter:creator" content="@MaziyarPanahi" /> <meta name="twitter:title" content="OpenMed — Clinical AI that never leaves the device" /> <meta name="twitter:description" - content="OpenMed 1.5.1 · OpenMed Multilingual Privacy Filter · Arabic/Japanese/Turkish PII · Nemotron Privacy Filter MLX · unified privacy routing · Swift-native OpenMedKit." /> + content="OpenMed 1.5.2 · OpenMed Multilingual Privacy Filter · Arabic/Japanese/Turkish PII · Nemotron Privacy Filter MLX · unified privacy routing · Swift-native OpenMedKit." /> <meta name="twitter:image" content="https://openmed.life/og.png?v=20260425" /> <meta name="twitter:image:alt" content="OpenMed social preview showing local-first healthcare LLMs and an interactive clinical NER terminal." /> @@ -57,7 +57,7 @@ "operatingSystem": "macOS, iOS, iPadOS, Cloud, On-Premises", "applicationCategory": "AIApplication", "description": "OpenMed Clinical LLM Suite provides open-source healthcare language models, biomedical named-entity recognition, Python MLX acceleration, and Swift-native deployment toolkits for compliant medical workflows.", - "softwareVersion": "1.5.1", + "softwareVersion": "1.5.2", "url": "https://openmed.life/", "provider": { "@type": "Organization", "name": "OpenMed", "url": "https://openmed.life/" }, "offers": { "@type": "Offer", "price": "0", "priceCurrency": "USD", "availability": "https://schema.org/InStock" }, @@ -104,7 +104,7 @@ <circle cx="20" cy="20" r="2" fill="var(--accent)" /> </svg> <span class="wordmark">open<span class="med">med</span></span> - <span class="version-tag">v1.5.1</span> + <span class="version-tag">v1.5.2</span> </a> <nav aria-label="Primary"> @@ -146,7 +146,7 @@ <section id="home" class="hero bg-grid" data-screen> <div class="container hero-grid"> <div> - <div class="eyebrow">OpenMed 1.5.1 · OpenMed Multilingual Privacy Filter · 8-bit MLX</div> + <div class="eyebrow">OpenMed 1.5.2 · OpenMed Multilingual Privacy Filter · 8-bit MLX</div> <h1 class="display-xl hero-title"> Clinical AI that never leaves the <span class="serif-italic">device</span>. </h1> @@ -667,7 +667,7 @@ <h2 class="display-lg">Four lines to <span class="serif-italic">production</span <span class="code-dot yellow"></span> <span class="code-dot green"></span> </div> - <span class="code-title">openmed 1.5.1 · quickstart</span> + <span class="code-title">openmed 1.5.2 · quickstart</span> <button class="code-copy" type="button">copy</button> </div> <div class="code-tabs"></div>
openmed/__about__.py+1 −1 modified@@ -1,3 +1,3 @@ """Version information for OpenMed.""" -__version__ = "1.5.1" +__version__ = "1.5.2"
openmed/core/backends.py+12 −2 modified@@ -257,5 +257,15 @@ def create_privacy_filter_pipeline(model_name: str) -> Callable: from openmed.mlx.inference import create_mlx_pipeline return create_mlx_pipeline(actual_model) - from openmed.torch.privacy_filter import PrivacyFilterTorchPipeline - return PrivacyFilterTorchPipeline(actual_model) + from openmed.torch.privacy_filter import ( + PrivacyFilterTorchPipeline, + is_trusted_for_remote_code, + ) + # ``trust_remote_code=True`` is required to import the custom modeling + # code shipped inside first-party privacy-filter repos. Only enable it + # when the resolved model is on the allowlist; the pipeline itself + # double-checks and raises ``ValueError`` if the gate is bypassed. + return PrivacyFilterTorchPipeline( + actual_model, + trust_remote_code=is_trusted_for_remote_code(actual_model), + )
openmed/core/pii.py+22 −3 modified@@ -139,6 +139,19 @@ def to_dict(self) -> dict: _DAY_FIRST_LANGS = frozenset({"fr", "de", "it", "es", "nl", "hi", "te", "pt", "ar", "tr"}) _PRIVACY_FILTER_FAMILY_ALIASES = frozenset({"openai-privacy-filter", "privacy-filter"}) +# Repository-prefix allowlist for org/model identifiers that route through the +# privacy-filter dispatcher. The dispatcher loads these via Transformers' +# custom-code path (trust_remote_code), so only first-party orgs are matched. +# An identifier qualifies if it is exactly one of the prefixes (with any +# trailing hyphen stripped) or starts with the prefix. Untrusted names whose +# substring contains "privacy-filter" (e.g. attacker/foo-privacy-filter-bar) +# are intentionally NOT matched and fall through to the standard PII loader, +# which never enables trust_remote_code. +_TRUSTED_PRIVACY_FILTER_PREFIXES = ( + "openai/privacy-filter", + "openmed/privacy-filter-", +) + def _normalize_model_family(value: Optional[str]) -> str: if not value: @@ -148,9 +161,15 @@ def _normalize_model_family(value: Optional[str]) -> str: def _looks_like_privacy_filter_identifier(value: Optional[str]) -> bool: normalized = _normalize_model_family(value) - return bool(normalized) and ( - normalized in _PRIVACY_FILTER_FAMILY_ALIASES or "privacy-filter" in normalized - ) + if not normalized: + return False + if normalized in _PRIVACY_FILTER_FAMILY_ALIASES: + return True + for prefix in _TRUSTED_PRIVACY_FILTER_PREFIXES: + bare = prefix.rstrip("-") + if normalized == bare or normalized.startswith(prefix): + return True + return False @lru_cache(maxsize=32)
openmed/torch/privacy_filter.py+68 −8 modified@@ -14,13 +14,58 @@ from __future__ import annotations import logging +import os from typing import Any, Dict, List, Optional from openmed.core.decoding import refine_privacy_filter_span, trim_span_whitespace logger = logging.getLogger(__name__) +# First-party privacy-filter repos that legitimately require +# trust_remote_code=True (they ship modeling_openai_privacy_filter.py and +# friends in the repo and rely on Transformers' auto_map import). +TRUSTED_REMOTE_CODE_MODELS = frozenset({ + "openai/privacy-filter", + "OpenMed/privacy-filter-multilingual", + "OpenMed/privacy-filter-nemotron", +}) + +# Operators with custom fine-tunes can extend the allowlist with a +# comma-separated list of HuggingFace repo IDs. Empty entries are ignored. +_ALLOWLIST_ENV_VAR = "OPENMED_TRUSTED_REMOTE_CODE_MODELS" + + +def _env_allowlist() -> frozenset[str]: + raw = os.getenv(_ALLOWLIST_ENV_VAR, "") + return frozenset(part.strip() for part in raw.split(",") if part.strip()) + + +def is_trusted_for_remote_code(model_name: str) -> bool: + """Return True if *model_name* may be loaded with ``trust_remote_code=True``. + + Trusted sources: + + - ``TRUSTED_REMOTE_CODE_MODELS`` — first-party OpenAI/OpenMed + privacy-filter repos that ship custom modeling code. + - The ``OPENMED_TRUSTED_REMOTE_CODE_MODELS`` env var — operator-extensible + comma-separated list of repo IDs for custom fine-tunes. + - Local filesystem paths that identify as a privacy-filter artifact + via on-disk metadata (the ``_is_privacy_filter_artifact_path`` check + already used by the dispatcher). + """ + if not model_name: + return False + if model_name in TRUSTED_REMOTE_CODE_MODELS: + return True + if model_name in _env_allowlist(): + return True + # Local path check is deferred (it touches the filesystem) and imported + # lazily to avoid a circular import with openmed.core.pii. + from openmed.core.pii import _is_privacy_filter_artifact_path + return _is_privacy_filter_artifact_path(model_name) + + class PrivacyFilterTorchPipeline: """Run ``openai/privacy-filter`` (or compatible) via Transformers. @@ -40,12 +85,14 @@ class PrivacyFilterTorchPipeline: output shape. local_files_only: When True, never download from the Hub — only use a cached copy. Mirrors the demo's offline-first default. - trust_remote_code: The OpenAI Privacy Filter family ships with - custom modeling code (``modeling_openai_privacy_filter.py``) in - the model repo, which transformers needs permission to import. - Defaults to ``True`` because this pipeline is *specifically* - for that family — set to ``False`` to opt out (and accept that - loading will fail without an upstream registration). + trust_remote_code: When True, the loader permits Transformers to + execute custom Python shipped inside the model repo via + ``auto_map``. This is required by the first-party Privacy + Filter models (which ship ``modeling_openai_privacy_filter.py``) + but is dangerous for arbitrary HuggingFace repositories. + Defaults to ``False``. When True, ``model_name`` must be in the + allowlist resolved by :func:`is_trusted_for_remote_code` — + otherwise a :class:`ValueError` is raised before any download. """ DEFAULT_MODEL_ID = "openai/privacy-filter" @@ -58,8 +105,17 @@ def __init__( dtype: Optional[str] = None, aggregation_strategy: str = "simple", local_files_only: bool = False, - trust_remote_code: bool = True, + trust_remote_code: bool = False, ) -> None: + if trust_remote_code and not is_trusted_for_remote_code(model_name): + raise ValueError( + f"Refusing to load {model_name!r} with trust_remote_code=True: " + "model is not in the OpenMed trusted-remote-code allowlist. " + "Trusted repos are listed in " + "openmed.torch.privacy_filter.TRUSTED_REMOTE_CODE_MODELS; " + f"to extend, set {_ALLOWLIST_ENV_VAR} to a comma-separated " + "list of repo IDs you control." + ) try: import torch from transformers import ( @@ -151,4 +207,8 @@ def _normalize_entity(item: Dict[str, Any], text: str) -> Dict[str, Any]: } -__all__ = ["PrivacyFilterTorchPipeline"] +__all__ = [ + "PrivacyFilterTorchPipeline", + "TRUSTED_REMOTE_CODE_MODELS", + "is_trusted_for_remote_code", +]
README.md+5 −5 modified@@ -56,7 +56,7 @@ Apple Silicon acceleration in Python: uv pip install -e ".[mlx]" ``` -Swift apps on macOS and iOS use `OpenMedKit`. As of `1.5.1`, that means: +Swift apps on macOS and iOS use `OpenMedKit`. As of `1.5.2`, that means: - **MLX** on Apple Silicon macOS and real iPhone/iPad hardware for supported OpenMed PII, OpenAI Privacy Filter, OpenAI Nemotron Privacy Filter, OpenMed Multilingual Privacy Filter, and experimental GLiNER-family artifacts - **CoreML** when you already have a bundled Apple model package or want the fallback Apple path @@ -65,7 +65,7 @@ Add the Swift package like this: ```swift dependencies: [ - .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.5.1"), + .package(url: "https://github.com/maziyarpanahi/openmed.git", from: "1.5.2"), ] ``` @@ -140,7 +140,7 @@ result = processor.process_texts([ - **Advanced NER Processing**: Confidence filtering, entity grouping, and span alignment - **Multiple Output Formats**: Dict, JSON, HTML, CSV for any downstream system -### Production Tools (v1.5.1) +### Production Tools (v1.5.2) - **Batch Processing**: Multi-text and multi-file workflows with progress tracking - **Configuration Profiles**: `dev`/`prod`/`test`/`fast` presets with flexible overrides @@ -195,8 +195,8 @@ uvicorn openmed.service.app:app --host 0.0.0.0 --port 8080 ### Run with Docker ```bash -docker build -t openmed:1.5.1 . -docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.5.1 +docker build -t openmed:1.5.2 . +docker run --rm -p 8080:8080 -e OPENMED_PROFILE=prod openmed:1.5.2 ``` ### Example request
tests/unit/service/test_api.py+58 −0 modified@@ -5,6 +5,7 @@ import time from datetime import datetime from typing import Any +from unittest.mock import patch import openmed import pytest @@ -478,6 +479,63 @@ def create_pipeline(self, model_name: str, **kwargs: Any): pass +def test_pii_extract_rejects_attacker_controlled_privacy_filter_model_name( + client, monkeypatch, fake_loader_cls, +): + """CVE-2026-47117 regression: a request with an attacker-controlled + repo name whose path contains "privacy-filter" must NOT route through + the privacy-filter dispatcher (which would otherwise load with + trust_remote_code=True). It should fall through to the standard PII + loader, which never enables custom-code execution. + """ + monkeypatch.setattr(openmed, "analyze_text", lambda *args, **kwargs: _sample_prediction_result()) + + with patch( + "openmed.torch.privacy_filter.PrivacyFilterTorchPipeline", + ) as MockPipeline, patch( + "openmed.core.backends.create_privacy_filter_pipeline", + ) as mock_factory: + response = client.post( + "/pii/extract", + json={ + "text": "John Doe called 555-1212", + "model_name": "attacker/foo-privacy-filter-bar", + "confidence_threshold": 0.0, + }, + ) + + assert response.status_code == 200 + MockPipeline.assert_not_called() + mock_factory.assert_not_called() + + +def test_pii_deidentify_rejects_attacker_controlled_privacy_filter_model_name( + client, monkeypatch, fake_loader_cls, +): + """CVE-2026-47117 regression: same as the /pii/extract case but for the + /pii/deidentify endpoint, which reaches the same vulnerable code path.""" + monkeypatch.setattr(openmed, "analyze_text", lambda *args, **kwargs: _sample_prediction_result()) + + with patch( + "openmed.torch.privacy_filter.PrivacyFilterTorchPipeline", + ) as MockPipeline, patch( + "openmed.core.backends.create_privacy_filter_pipeline", + ) as mock_factory: + response = client.post( + "/pii/deidentify", + json={ + "text": "John Doe called 555-1212", + "method": "mask", + "model_name": "attacker/foo-privacy-filter-bar", + "confidence_threshold": 0.0, + }, + ) + + assert response.status_code == 200 + MockPipeline.assert_not_called() + mock_factory.assert_not_called() + + def test_second_request_reuses_shared_warmed_pipeline(monkeypatch, fake_loader_cls): monkeypatch.setenv("OPENMED_PROFILE", "test") monkeypatch.setenv("OPENMED_SERVICE_PRELOAD_MODELS", "disease_detection_superclinical")
tests/unit/test_privacy_filter_routing.py+8 −2 modified@@ -199,7 +199,11 @@ def test_torch_route_constructs_torch_pipeline(self): patch("openmed.torch.privacy_filter.PrivacyFilterTorchPipeline") as MockPF: MockPF.return_value = sentinel pipeline = create_privacy_filter_pipeline("openai/privacy-filter") - MockPF.assert_called_once_with("openai/privacy-filter") + # Trusted first-party repo: dispatcher opts in to the custom-code + # path that openai/privacy-filter's modeling files require. + MockPF.assert_called_once_with( + "openai/privacy-filter", trust_remote_code=True, + ) assert pipeline("hi") == sentinel("hi") def test_mlx_request_on_linux_substitutes_torch_model(self): @@ -213,7 +217,9 @@ def test_mlx_request_on_linux_substitutes_torch_model(self): warnings.simplefilter("always") MockPF.return_value = _fake_pipeline([]) create_privacy_filter_pipeline("OpenMed/privacy-filter-mlx-8bit") - MockPF.assert_called_once_with("openai/privacy-filter") + MockPF.assert_called_once_with( + "openai/privacy-filter", trust_remote_code=True, + ) assert any(issubclass(w.category, UserWarning) for w in caught)
tests/unit/test_privacy_filter_security.py+278 −0 added@@ -0,0 +1,278 @@ +"""Security regression tests for the privacy-filter loader. + +Covers CVE-2026-47117 — an unauthenticated attacker who controls the +``model_name`` request parameter could supply ``attacker/foo-privacy-filter-bar`` +and have OpenMed load it with ``trust_remote_code=True``, executing arbitrary +Python from ``auto_map`` entries in the repo's ``config.json``. + +These tests exercise the three defense layers added in 1.5.2: + +1. The identifier matcher (``_looks_like_privacy_filter_identifier``) only + routes requests through the privacy-filter dispatcher for first-party + ``openai/privacy-filter`` and ``OpenMed/privacy-filter-*`` repos. +2. The ``PrivacyFilterTorchPipeline`` constructor refuses to pass + ``trust_remote_code=True`` for any model outside the allowlist. +3. ``create_privacy_filter_pipeline`` only opts in to ``trust_remote_code`` + for resolved names that pass the allowlist check. +""" + +from __future__ import annotations + +import json +import sys +import types +from unittest.mock import MagicMock, patch + +import pytest + + +# --------------------------------------------------------------------------- +# Layer 1: identifier matching +# --------------------------------------------------------------------------- + + +class TestPrivacyFilterIdentifierMatching: + """``_looks_like_privacy_filter_identifier`` must not be tricked by + arbitrary HuggingFace repository names that merely contain the substring + ``privacy-filter``.""" + + @pytest.mark.parametrize( + "attacker_name", + [ + "attacker/foo-privacy-filter-bar", + "attacker/privacy-filter-rce", + "evil-org/my-privacy-filter", + "some/privacy-filter-evil", + "username/privacy-filter", # third-party org, not openai/OpenMed + "OpenMed/not-privacy-filter-but-similar", # missing trailing-prefix + ], + ) + def test_attacker_controlled_names_are_not_recognized(self, attacker_name): + from openmed.core.pii import _looks_like_privacy_filter_identifier + assert _looks_like_privacy_filter_identifier(attacker_name) is False + + @pytest.mark.parametrize( + "trusted_name", + [ + "openai/privacy-filter", + "OpenAI/Privacy-Filter", # case-insensitive normalization + "OpenMed/privacy-filter-multilingual", + "OpenMed/privacy-filter-nemotron", + "OpenMed/privacy-filter-mlx", + "OpenMed/privacy-filter-mlx-8bit", + "OpenMed/privacy-filter-multilingual-mlx", + "OpenMed/privacy-filter-multilingual-mlx-8bit", + "OpenMed/privacy-filter-nemotron-mlx", + "OpenMed/privacy-filter-nemotron-mlx-8bit", + ], + ) + def test_first_party_names_are_still_recognized(self, trusted_name): + from openmed.core.pii import _looks_like_privacy_filter_identifier + assert _looks_like_privacy_filter_identifier(trusted_name) is True + + @pytest.mark.parametrize( + "alias", + ["privacy-filter", "privacy_filter", "openai-privacy-filter"], + ) + def test_family_aliases_are_recognized(self, alias): + from openmed.core.pii import _looks_like_privacy_filter_identifier + assert _looks_like_privacy_filter_identifier(alias) is True + + @pytest.mark.parametrize("falsy", ["", None, " "]) + def test_blank_inputs_are_not_recognized(self, falsy): + from openmed.core.pii import _looks_like_privacy_filter_identifier + assert _looks_like_privacy_filter_identifier(falsy) is False + + +# --------------------------------------------------------------------------- +# Layer 2: PrivacyFilterTorchPipeline gate +# --------------------------------------------------------------------------- + + +class TestPrivacyFilterTorchPipelineGate: + """Direct instantiation must refuse untrusted models even when callers + explicitly opt in to ``trust_remote_code=True``.""" + + def test_attacker_name_with_trust_remote_code_raises(self): + from openmed.torch.privacy_filter import PrivacyFilterTorchPipeline + with pytest.raises(ValueError, match="trusted-remote-code allowlist"): + PrivacyFilterTorchPipeline( + "attacker/foo-privacy-filter-bar", + trust_remote_code=True, + ) + + def test_default_trust_remote_code_is_false(self): + """If a caller does not opt in, the constructor must not load with + ``trust_remote_code=True`` even for a trusted name.""" + captured = {} + with _patched_transformers(captured): + from openmed.torch.privacy_filter import PrivacyFilterTorchPipeline + PrivacyFilterTorchPipeline("openai/privacy-filter") + assert captured["tokenizer_kwargs"]["trust_remote_code"] is False + assert captured["model_kwargs"]["trust_remote_code"] is False + + def test_trusted_model_with_explicit_opt_in_loads(self): + captured = {} + with _patched_transformers(captured): + from openmed.torch.privacy_filter import PrivacyFilterTorchPipeline + PrivacyFilterTorchPipeline( + "openai/privacy-filter", + trust_remote_code=True, + ) + assert captured["tokenizer_kwargs"]["trust_remote_code"] is True + assert captured["model_kwargs"]["trust_remote_code"] is True + + +class TestIsTrustedForRemoteCode: + """The allowlist function backs the gate.""" + + @pytest.mark.parametrize( + "trusted", + [ + "openai/privacy-filter", + "OpenMed/privacy-filter-multilingual", + "OpenMed/privacy-filter-nemotron", + ], + ) + def test_hardcoded_first_party_models_are_trusted(self, trusted): + from openmed.torch.privacy_filter import is_trusted_for_remote_code + assert is_trusted_for_remote_code(trusted) is True + + @pytest.mark.parametrize( + "attacker", + [ + "attacker/foo-privacy-filter-bar", + "some-org/privacy-filter", + "OpenMed/privacy-filter-mlx", # MLX-only repo, not in HF code path + "", + "openai/privacy-filter-evil", # not exact match, not in allowlist + ], + ) + def test_other_models_are_not_trusted(self, attacker, monkeypatch): + monkeypatch.delenv("OPENMED_TRUSTED_REMOTE_CODE_MODELS", raising=False) + from openmed.torch.privacy_filter import is_trusted_for_remote_code + assert is_trusted_for_remote_code(attacker) is False + + def test_env_var_extends_allowlist(self, monkeypatch): + from openmed.torch.privacy_filter import is_trusted_for_remote_code + monkeypatch.setenv( + "OPENMED_TRUSTED_REMOTE_CODE_MODELS", + "my-org/my-fork,other-org/another-fork , ,", + ) + assert is_trusted_for_remote_code("my-org/my-fork") is True + assert is_trusted_for_remote_code("other-org/another-fork") is True + # Unrelated names are still rejected. + assert is_trusted_for_remote_code("attacker/foo-privacy-filter") is False + + def test_env_var_unset_does_not_trust_extras(self, monkeypatch): + monkeypatch.delenv("OPENMED_TRUSTED_REMOTE_CODE_MODELS", raising=False) + from openmed.torch.privacy_filter import is_trusted_for_remote_code + assert is_trusted_for_remote_code("my-org/my-fork") is False + + def test_local_privacy_filter_artifact_is_trusted(self, tmp_path): + """A local directory whose config.json declares the privacy-filter + family should be loadable with custom code, since the file system + is already under the operator's control.""" + artifact = tmp_path / "local-privacy-filter" + artifact.mkdir() + (artifact / "config.json").write_text( + json.dumps({"model_type": "openai-privacy-filter"}) + ) + + from openmed.torch.privacy_filter import is_trusted_for_remote_code + # Clear the lru_cache so the tmp_path is actually probed. + from openmed.core import pii as _pii + _pii._is_privacy_filter_artifact_path.cache_clear() + + assert is_trusted_for_remote_code(str(artifact)) is True + + def test_local_unrelated_artifact_is_not_trusted(self, tmp_path): + artifact = tmp_path / "not-a-privacy-filter" + artifact.mkdir() + (artifact / "config.json").write_text( + json.dumps({"model_type": "bert"}) + ) + from openmed.torch.privacy_filter import is_trusted_for_remote_code + from openmed.core import pii as _pii + _pii._is_privacy_filter_artifact_path.cache_clear() + + assert is_trusted_for_remote_code(str(artifact)) is False + + +# --------------------------------------------------------------------------- +# Layer 3: create_privacy_filter_pipeline opt-in +# --------------------------------------------------------------------------- + + +class TestCreatePrivacyFilterPipelineRemoteCodeOptIn: + """The dispatcher should pass ``trust_remote_code=True`` only for + resolved names that pass the allowlist check.""" + + def test_trusted_model_passes_trust_remote_code_true(self): + from openmed.core.backends import create_privacy_filter_pipeline + with patch("openmed.core.backends.MLXBackend.is_available", return_value=False), \ + patch("openmed.torch.privacy_filter.PrivacyFilterTorchPipeline") as MockPF: + MockPF.return_value = lambda _text: [] + create_privacy_filter_pipeline("openai/privacy-filter") + MockPF.assert_called_once_with( + "openai/privacy-filter", + trust_remote_code=True, + ) + + def test_untrusted_model_passes_trust_remote_code_false(self): + """Defense in depth: even if Layer 1 ever routes an untrusted name + here, the dispatcher must not opt in to trust_remote_code.""" + from openmed.core.backends import create_privacy_filter_pipeline + with patch("openmed.core.backends.MLXBackend.is_available", return_value=False), \ + patch("openmed.torch.privacy_filter.PrivacyFilterTorchPipeline") as MockPF: + MockPF.return_value = lambda _text: [] + create_privacy_filter_pipeline("attacker/foo-privacy-filter-bar") + MockPF.assert_called_once_with( + "attacker/foo-privacy-filter-bar", + trust_remote_code=False, + ) + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + + +def _patched_transformers(captured): + """Patch the ``transformers``/``torch`` modules just deep enough for + ``PrivacyFilterTorchPipeline.__init__`` to run without hitting the Hub. + + Captures the kwargs passed to ``AutoTokenizer.from_pretrained`` and + ``AutoModelForTokenClassification.from_pretrained`` into *captured*. + """ + fake_torch = types.ModuleType("torch") + fake_torch.cuda = types.SimpleNamespace(is_available=lambda: False) + + class _FakeTokenizer: + @staticmethod + def from_pretrained(model_name, **kwargs): + captured["tokenizer_model"] = model_name + captured["tokenizer_kwargs"] = kwargs + return MagicMock() + + class _FakeModel: + @staticmethod + def from_pretrained(model_name, **kwargs): + captured["model_model"] = model_name + captured["model_kwargs"] = kwargs + m = MagicMock() + m.to.return_value = m + return m + + def _fake_pipeline(*args, **kwargs): + return lambda text: [] + + fake_transformers = types.ModuleType("transformers") + fake_transformers.AutoTokenizer = _FakeTokenizer + fake_transformers.AutoModelForTokenClassification = _FakeModel + fake_transformers.pipeline = _fake_pipeline + + return patch.dict( + sys.modules, + {"torch": fake_torch, "transformers": fake_transformers}, + )
Vulnerability mechanics
Root cause
"The privacy-filter dispatcher used broad substring matching on the user-supplied model_name parameter, allowing arbitrary code execution when loading Hugging Face models with trust_remote_code=True."
Attack vector
An unauthenticated attacker can supply a malicious model repository containing custom Transformers code via auto_map in config.json or tokenizer_config.json. The broad substring matching on the `model_name` parameter allows a value such as `attacker/foo-privacy-filter-bar` to route through a path that loads Hugging Face models with `trust_remote_code=True` [ref_id=1]. This custom code is then executed with the privileges of the OpenMed service process.
Affected code
The vulnerability exists in the privacy-filter dispatcher's identifier matching logic, specifically within `openmed/core/pii.py`, and the `PrivacyFilterTorchPipeline` class in `openmed/torch/privacy_filter.py`. The `create_privacy_filter_pipeline` function in `openmed/core/backends.py` also plays a role in how the `trust_remote_code` parameter is handled [ref_id=2].
What the fix does
The patch hardens the privacy-filter loading path by implementing an explicit allowlist for first-party privacy-filter models and local artifacts. It also changes the default for `trust_remote_code` to `False` in `PrivacyFilterTorchPipeline` and ensures `create_privacy_filter_pipeline` only opts into `trust_remote_code=True` for allowlisted models [patch_id=4518417]. This prevents arbitrary repositories from being loaded with custom code execution enabled, mitigating the remote code execution vulnerability.
Preconditions
- inputThe attacker must control the `model_name` parameter passed to the `/pii/extract` or `/pii/deidentify` endpoints.
Generated on Jun 2, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.
References
4News mentions
0No linked articles in our index yet.