Markdown to HTML at 309 MiB/s. Faster than pulldown-cmark, md4c (C), and comrak. Passes all 652 CommonMark spec tests. Every GFM extension included.
let html = ferromark::to_html("# Hello\n\n**World**");One function call, no setup. When allocation pressure matters:
let mut buffer = Vec::new();
ferromark::to_html_into("# Reuse me", &mut buffer);
// buffer survives across calls — zero repeated allocationNumbers, not adjectives. Apple Silicon (M-series), February 2026. All parsers run with GFM tables, strikethrough, and task lists enabled. Output buffers reused where APIs allow. Non-PGO binaries for a fair comparison.
CommonMark 5 KB (wiki-style, mixed content with tables)
| Parser | Throughput | vs ferromark |
|---|---|---|
| ferromark | 289.9 MiB/s | baseline |
| pulldown-cmark | 247.7 MiB/s | 0.85x |
| md4c (C) | 242.3 MiB/s | 0.84x |
| comrak | 73.7 MiB/s | 0.25x |
CommonMark 50 KB (same style, scaled)
| Parser | Throughput | vs ferromark |
|---|---|---|
| ferromark | 309.3 MiB/s | baseline |
| pulldown-cmark | 271.7 MiB/s | 0.88x |
| md4c (C) | 247.4 MiB/s | 0.80x |
| comrak | 76.0 MiB/s | 0.25x |
17% faster than pulldown-cmark. 25% faster than md4c. 4x faster than comrak.
The fixtures are synthetic wiki-style documents with paragraphs, lists, code blocks, and tables. Nothing cherry-picked. Run them yourself: cargo bench --bench comparison
Full CommonMark: 652/652 spec tests pass. No filtering, no exceptions.
All five GFM extensions: Tables, strikethrough, task lists, autolink literals, disallowed raw HTML.
Beyond GFM: Footnotes, front matter extraction (---/+++), heading IDs (GitHub-compatible slugs), math spans ($/$$), highlight/mark syntax (==text==), superscript (^text^), subscript (~text~), and callouts (> [!NOTE], > [!WARNING], ...).
MDX support (opt-in via mdx feature): Segment and render .mdx files without a JavaScript toolchain. Covers 90%+ of real-world MDX patterns in Next.js, Docusaurus, and Astro.
15 feature flags to turn on exactly what you need:
allow_html · allow_link_refs · tables · strikethrough · highlight · superscript · subscript · task_lists
autolink_literals · disallowed_raw_html · footnotes · front_matter
heading_ids · math · callouts
Syntax note: ferromark uses ~~text~~ for strikethrough, ~text~ for subscript, and ^text^ for superscript. Single-tilde strikethrough is intentionally not supported.
ferromark is built for one job: turning Markdown into HTML as fast as possible. That focus means some things it deliberately skips:
- No AST access. You can't walk a syntax tree or write custom renderers against parsed nodes. If you need that, pulldown-cmark's iterator model or comrak's AST are better fits.
- No source maps. No byte-offset tracking for mapping HTML back to Markdown positions.
- HTML only. No XML, no CommonMark round-tripping, no alternative output formats.
These aren't planned. They'd compromise the streaming architecture that makes ferromark fast.
MDX is the standard for component-driven docs in Next.js, Docusaurus, and Astro. Processing it usually requires a full JavaScript toolchain — Node.js, acorn, babel, the works.
ferromark takes a different approach: segment .mdx files into typed blocks and render them at native speed. No JS runtime. No AST.
ferromark = { version = "0.1", features = ["mdx"] }render() assembles the final output automatically: Markdown segments become HTML, JSX and expressions pass through unchanged, ESM and front matter are extracted separately.
use ferromark::mdx::render;
let input = r#"import { Card } from './card'
---
title: Hello
---
# Hello World
<Card title="Example">
Markdown **inside** a component.
</Card>
{new Date().getFullYear()}
"#;
let output = render(input);
// output.body — HTML with JSX/expressions passed through
// output.esm — vec!["import { Card } from './card'\n"]
// output.front_matter — Some("title: Hello\n")Use render_with_options() for custom Markdown settings (heading IDs, math, footnotes, etc.).
to_component() wraps the output as a complete JSX/TSX module with a named export. Works with React 19, Preact, Solid, and any JSX framework.
let output = render(input);
let tsx = output.to_component("HelloWorld");import { Card } from './card'
export function HelloWorld() {
return (
<>
<h1 id="hello-world">Hello World</h1>
<Card title="Example">
<p>Markdown <strong>inside</strong> a component.</p>
</Card>
{new Date().getFullYear()}
</>
);
}When you need full control over each block, use segment() directly:
use ferromark::mdx::{segment, Segment};
for seg in segment(input) {
match seg {
Segment::Esm(s) => { /* import/export — pass through */ }
Segment::Markdown(s) => { /* parse with ferromark::to_html(s) */ }
Segment::JsxBlockOpen(s) => { /* <Component> */ }
Segment::JsxBlockClose(s) => { /* </Component> */ }
Segment::JsxBlockSelfClose(s)=> { /* <Component /> */ }
Segment::Expression(s) => { /* {expression} */ }
}
}The segmenter handles JSX attribute parsing (strings, expressions, spreads), brace-depth tracking (with string/comment/template-literal awareness), fragment syntax, member expressions (<Foo.Bar>), and multiline tags. Invalid constructs fall back to Markdown — no panics, always valid output.
Full example: cargo run --features mdx --example mdx_segment
Scope and coverage
The segmenter covers the block-level MDX patterns that make up 90%+ of real-world .mdx files: imports at the top, components wrapping content, expressions between paragraphs. This is what a typical Docusaurus, Next.js, or Astro page looks like — and it works out of the box.
What the segmenter deliberately skips — and why that's fine for most use cases:
| What | Our approach | When it matters |
|---|---|---|
Inline JSX (text <em>here</em>) |
Stays inside Markdown segments | Only if you mix JSX and prose on the same line inside a paragraph — rare in practice |
| JS validation | Heuristic detection (keyword + brace counting) instead of acorn/swc | Only if you need to report syntax errors in user-authored MDX at parse time |
| Markdown grammar | Standard CommonMark/GFM rules | Official mdxjs disables indented code and HTML syntax — relevant if your content relies on <div> being JSX, not HTML |
| Container nesting | > <Component> stays Markdown |
Only if you put JSX inside blockquotes or list items — uncommon |
| TypeScript generics | <Component<T>> not parsed |
Only relevant for TSX-heavy content pages — very rare in docs |
| Error reporting | Silent fallback to Markdown | Means broken JSX renders as text instead of failing — arguably safer for content pipelines |
The full @mdx-js/mdx compiler exists to produce a React component tree from MDX. It needs a JavaScript parser because it compiles to JSX. ferromark's segmenter exists to answer a simpler question: where does the Markdown stop and the JSX start? That question doesn't need a JS runtime.
For the detailed technical spec, see src/mdx/mod.rs.
No AST. Block events stream from the scanner to the HTML writer with nothing in between.
Input bytes (&[u8])
│
▼
Block parser (line-oriented, memchr-driven)
│ emits BlockEvent stream
▼
Inline parser (mark collection → resolution → emit)
│ emits InlineEvent stream
▼
HTML writer (direct buffer writes)
│
▼
Output (Vec<u8>)
What makes this fast in practice:
- Block scanning runs on
memchrfor line boundaries. Container state is a compact stack, not a tree. - Inline parsing has three phases: collect delimiter marks, resolve precedence (code spans, math, links, emphasis, strikethrough, subscript, superscript, highlight), emit. No backtracking.
- Emphasis resolution uses the CommonMark modulo-3 rule with a delimiter stack instead of expensive rescans.
- SIMD scanning (NEON on ARM) detects special characters in inline content.
- Zero-copy references: events carry
Rangepointers into the input, not copied strings. - Compact events: 24 bytes each, cache-line friendly.
- Hot/cold annotation:
#[inline]on tight loops,#[cold]on error paths, table-driven byte classification.
- Linear time. No regex, no backtracking, no quadratic blowup on adversarial input.
- Low allocation pressure. Compact events, range references, reusable output buffers.
- Operational safety. Depth and size limits guard against pathological nesting.
- Small dependency surface. Minimal crates, straightforward integration.
Detailed parser comparison
How ferromark compares to the other three top-tier parsers across architecture, features, and output. Ratings use a 4-level heatmap focused on end-to-end Markdown-to-HTML throughput. Scoring is relative per row, so each row has at least one top mark.
Legend: 🟩 strongest 🟨 close behind 🟧 notable tradeoffs 🟥 weakest
Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md
| Feature | ferromark | md4c | pulldown-cmark | comrak |
|---|---|---|---|---|
| Performance-critical architecture and memory | ||||
| Parser model (streaming, no AST) | 🟩 | 🟩 | 🟨 | 🟥 |
| Streaming parsers emit output as they scan, avoiding intermediate trees. ferromark and md4c stream directly; pulldown-cmark uses a pull iterator; comrak builds an AST. | ||||
| API overhead profile | 🟩 | 🟩 | 🟨 | 🟥 |
| Measures overhead on straight Markdown-to-HTML throughput. md4c callbacks and ferromark streaming events are lean; pulldown-cmark pull iterators are close; comrak's AST model adds more overhead for this workload. | ||||
| Parse/render separation | 🟨 | 🟩 | 🟩 | 🟧 |
| Clear separation lets renderers be swapped or tuned. md4c and pulldown-cmark separate parse and render clearly; ferromark is mostly separated; comrak leans on AST-based renderers. | ||||
| Inline parsing pipeline | 🟩 | 🟨 | 🟨 | 🟥 |
| Multi-phase inline parsing (collect, resolve, emit) keeps the hot path linear. ferromark uses this approach; md4c and pulldown-cmark are optimized byte scanners; comrak does more AST bookkeeping. | ||||
| Emphasis matching efficiency | 🟩 | 🟨 | 🟨 | 🟥 |
| Stack-based algorithms reduce rescans on text-heavy documents. ferromark uses modulo-3 stacks; md4c and pulldown-cmark are optimized; comrak pays AST overhead. | ||||
| Link reference processing cost | 🟩 | 🟩 | 🟩 | 🟨 |
| Link labels need normalization. ferromark, md4c, and pulldown-cmark minimize allocations; comrak handles more feature paths. | ||||
| Zero-copy text handling | 🟩 | 🟨 | 🟨 | 🟥 |
| Text slices that point directly into input reduce allocation and copy costs. ferromark uses ranges; md4c and pulldown-cmark borrow slices; comrak allocates AST nodes. | ||||
| Allocation pressure (hot path) | 🟩 | 🟩 | 🟨 | 🟥 |
| Fewer allocations in tight loops means better CPU utilization. Streaming parsers allocate less during parse/render; AST parsers allocate many nodes. | ||||
| Output buffer reuse | 🟩 | 🟩 | 🟨 | 🟥 |
| Reusing buffers avoids repeated allocations across runs. ferromark, md4c, and pulldown-cmark allow reuse; comrak allocates internally. | ||||
| Memory locality | 🟩 | 🟩 | 🟨 | 🟥 |
| A small working set fits in cache. Streaming parsers keep it small; AST-based parsing expands it. | ||||
| Cache friendliness | 🟩 | 🟩 | 🟨 | 🟥 |
| Linear scans and contiguous buffers work well for CPU caches. ferromark and md4c favor linear scans; pulldown-cmark is close; comrak traverses AST allocations. | ||||
| SIMD availability | 🟩 | 🟨 | 🟩 | 🟥 |
| SIMD accelerates scanning for special characters. ferromark and pulldown-cmark have SIMD paths; md4c relies on C compiler optimizations; comrak is not SIMD-focused. | ||||
| Hot-path control | 🟩 | 🟩 | 🟧 | 🟥 |
| Performance headroom from low-level control in inner loops. md4c (C) and ferromark use tighter tuning; pulldown-cmark is mostly safe-Rust hot loops; comrak prioritizes flexibility. | ||||
| Dependency footprint | 🟩 | 🟩 | 🟨 | 🟥 |
| Fewer dependencies simplify builds. md4c and ferromark are minimal; pulldown-cmark is moderate; comrak is heavier. | ||||
| Throughput ceiling (architectural) | 🟩 | 🟩 | 🟨 | 🟥 |
| Streaming architectures with fewer allocations generally allow higher throughput ceilings. ferromark and md4c lead; pulldown-cmark is close; comrak trades throughput for flexibility. | ||||
| Feature coverage and extensibility | ||||
| Extension breadth | 🟩 | 🟧 | 🟨 | 🟩 |
| comrak has the broadest catalog; ferromark implements all 5 GFM extensions plus footnotes, front matter, heading IDs, math, highlight, subscript, superscript, and callouts; pulldown-cmark supports common GFM features; md4c supports common GFM features. | ||||
| Spec compliance (CommonMark) | 🟩 | 🟩 | 🟨 | 🟩 |
| All four target CommonMark. Beyond CommonMark and GFM, ferromark, pulldown-cmark, and comrak also support footnotes, heading IDs, math spans, and callouts. | ||||
| Extension configuration surface | 🟨 | 🟩 | 🟨 | 🟨 |
| Fine-grained flags let you disable features to reduce work. md4c has many flags; ferromark has 15 options; pulldown-cmark and comrak use option structs. | ||||
| Raw HTML control | 🟩 | 🟩 | 🟧 | 🟩 |
md4c and comrak expose explicit switches; ferromark provides allow_html and disallowed_raw_html; pulldown-cmark is more fixed. | ||||
| GFM tables | 🟩 | 🟩 | 🟩 | 🟩 |
| All four support GFM tables. | ||||
| Task lists, strikethrough | 🟩 | 🟨 | 🟨 | 🟩 |
| All four support both. | ||||
| Footnotes | 🟩 | 🟥 | 🟨 | 🟩 |
| ferromark, pulldown-cmark, and comrak support footnotes; md4c does not. | ||||
| Permissive autolinks | 🟩 | 🟩 | 🟧 | 🟨 |
| ferromark and md4c support GFM autolink literals (URL, www, email); comrak has relaxed autolinks; pulldown-cmark focuses on spec defaults. | ||||
| Output safety toggles | 🟨 | 🟩 | 🟧 | 🟩 |
md4c and comrak provide explicit unsafe/escape switches; ferromark provides allow_html and disallowed_raw_html; pulldown-cmark is more fixed. | ||||
| Rendering and output | ||||
| Output streaming | 🟩 | 🟩 | 🟨 | 🟥 |
| Incremental output lowers peak memory and removes extra passes. ferromark and md4c stream to buffers; pulldown-cmark streams events; comrak renders after AST work. | ||||
| Output customization hooks | 🟧 | 🟩 | 🟨 | 🟩 |
| Callbacks and ASTs are great for custom rendering but add indirection. md4c callbacks and comrak AST are very flexible; pulldown-cmark iterators are easy to transform; ferromark is lower level. | ||||
| Output formats | 🟥 | 🟧 | 🟨 | 🟩 |
| comrak emits HTML, XML, and CommonMark; pulldown-cmark provides HTML plus event streams; md4c has HTML and callbacks; ferromark targets HTML only. | ||||
| Source position support | 🟥 | 🟥 | 🟩 | 🟨 |
| pulldown-cmark has strong source map support; comrak can emit source positions; ferromark and md4c skip this for speed. | ||||
| Source map tooling | 🟥 | 🟥 | 🟩 | 🟨 |
| pulldown-cmark exposes event ranges; comrak can emit source position attributes; ferromark and md4c keep this minimal. | ||||
| IO friendliness | 🟩 | 🟩 | 🟧 | 🟥 |
| md4c and ferromark stream into buffers; pulldown-cmark recommends buffered output; comrak often builds strings after AST work. | ||||
cargo build # development
cargo build --release # optimized (recommended for benchmarks)
cargo test # run tests
cargo test --test commonmark_spec -- --nocapture # CommonMark spec
cargo bench # benchmarkssrc/
├── lib.rs # Public API (to_html, to_html_into, parse, Options)
├── main.rs # CLI binary
├── block/ # Block-level parser
│ ├── parser.rs # Line-oriented block parsing
│ └── event.rs # BlockEvent types
├── inline/ # Inline-level parser
│ ├── mod.rs # Three-phase inline parsing
│ ├── marks.rs # Mark collection + SIMD integration
│ ├── simd.rs # NEON SIMD character scanning
│ ├── event.rs # InlineEvent types
│ ├── code_span.rs
│ ├── emphasis.rs # Modulo-3 stack optimization
│ ├── strikethrough.rs # GFM strikethrough resolution
│ ├── subscript.rs # Subscript resolution (~text~)
│ ├── superscript.rs # Superscript resolution (^text^)
│ ├── math.rs # Math span resolution ($/$$ delimiters)
│ └── links.rs # Link/image/autolink parsing
├── mdx/ # MDX segmenter + renderer (feature = "mdx")
│ ├── mod.rs # Public API — Segment enum, segment(), render()
│ ├── render.rs # Assembly layer: segments → HTML body + ESM + front matter
│ ├── splitter.rs # Line-based state machine
│ ├── jsx_tag.rs # JSX tag boundary parser
│ └── expr.rs # Expression boundary parser (brace/string/comment tracking)
├── footnote.rs # Footnote store and rendering
├── link_ref.rs # Link reference definitions
├── cursor.rs # Pointer-based byte cursor
├── range.rs # Compact u32 range type
├── render.rs # HTML writer
├── escape.rs # HTML escaping (memchr-optimized)
└── limits.rs # DoS prevention constants
MIT -- Copyright 2026 Sebastian Software GmbH, Mainz, Germany