Fast Markdown-to-HTML for Rust workloads where throughput and predictable latency matter.
- Built for production paths, not toy inputs: docs pipelines, API rendering, and CLIs.
- Streaming parser design avoids AST overhead on the hot path.
- CommonMark-compliant while still tuned for raw speed.
- Small dependency surface and straightforward integration.
- Linear time behavior: no regex backtracking, no parser surprises on large inputs.
- Low allocation pressure: compact
Rangereferences into the input instead of copying text. - Cache-friendly execution: tight scanning loops, lookup tables, and reusable buffers.
- Operational safety: explicit depth/limit guards against pathological nesting.
Input bytes (&[u8])
│
▼
Block parser (line-oriented)
│ emits BlockEvent stream
▼
Inline parser (per text range)
│ emits InlineEvent stream
▼
HTML writer (direct buffer writes)
│
▼
Output (Vec<u8>)
- Block pass stays simple: cheap line scanning via
memchr, container stack for quotes/lists. - Inline pass is staged: collect marks -> resolve precedence (code, links, emphasis) -> emit.
- Hot-path tuning:
#[inline]where it matters,#[cold]for rare paths, table-driven classification. - CommonMark emphasis done right: modulo-3 delimiter handling without expensive rescans.
Benchmarked on Apple Silicon (M-series), latest run: February 8, 2026.
Workload: synthetic wiki-style documents with text-heavy paragraphs, lists, code blocks, and representative CommonMark features (benches/fixtures/commonmark-5k.md, benches/fixtures/commonmark-50k.md).
Method: output buffers are reused for ferromark, md4c, and pulldown-cmark where APIs allow; comrak allocates output internally. Default GFM extensions enabled for ferromark (tables, strikethrough, task lists, disallowed raw HTML; autolink literals is opt-in). Main table uses non-PGO binaries for apples-to-apples defaults.
CommonMark 5KB (GFM extensions enabled, includes tables)
| Parser | Throughput | Relative (vs ferromark) |
|---|---|---|
| ferromark | 289.9 MiB/s | 1.00x |
| pulldown-cmark | 247.7 MiB/s | 0.85x |
| md4c | 242.3 MiB/s | 0.84x |
| comrak | 73.7 MiB/s | 0.25x |
CommonMark 50KB (GFM extensions enabled, includes tables)
| Parser | Throughput | Relative (vs ferromark) |
|---|---|---|
| ferromark | 309.3 MiB/s | 1.00x |
| pulldown-cmark | 271.7 MiB/s | 0.88x |
| md4c | 247.4 MiB/s | 0.80x |
| comrak | 76.0 MiB/s | 0.25x |
All parsers run with GFM tables, strikethrough, and task lists enabled. Other candidates like markdown-rs are far slower in this workload and are omitted from the main tables to keep the comparison focused. Happy to run them on request.
Key results:
- ferromark is ~17% faster than pulldown-cmark at 5KB and ~14% faster at 50KB.
- ferromark is ~20% faster than md4c at 5KB and ~25% faster at 50KB.
- ferromark is ~3.9-4.1x faster than comrak across 5-50KB.
Run benchmarks: cargo bench --bench comparison
These are the four parsers included in the main benchmark. Ratings use a 4-level emoji heatmap focused on end-to-end Markdown-to-HTML throughput in typical workloads.
Legend:
- 🟩 = strongest in this row (ties allowed)
- 🟨 = close behind the row leader
- 🟧 = notable tradeoffs for this row
- 🟥 = weakest for this row's goal
Scoring is relative per row so each row has at least one 🟩. Each feature row is followed by a short plain-language explanation. Ferromark optimization backlog: docs/arch/ARCH-PLAN-001-performance-opportunities.md
| Feature | ferromark | md4c | pulldown-cmark | comrak |
|---|---|---|---|---|
| Performance-Critical Architecture and Memory | ||||
| Parser model (streaming, no AST) | 🟩 | 🟩 | 🟨 | 🟥 |
| Streaming parsers can emit output as they scan input, which avoids building an intermediate tree and keeps memory and cache pressure low. Mapping: ferromark and md4c stream; pulldown-cmark uses a pull iterator; comrak builds an AST. | ||||
| API overhead profile (push / pull / AST) | 🟩 | 🟩 | 🟨 | 🟥 |
| This score reflects API overhead on straight Markdown-to-HTML throughput, not API flexibility. Mapping: md4c callbacks and ferromark streaming events are lean; pulldown-cmark pull iterators are close; comrak's AST model adds more overhead for this workload. | ||||
| Parse/render separation | 🟨 | 🟩 | 🟩 | 🟧 |
| Clear separation lets parsers stay simple and fast, while renderers can be swapped or tuned. Mapping: md4c and pulldown-cmark separate parse and render clearly; ferromark is mostly separated; comrak leans on AST-based renderers. | ||||
| Inline parsing pipeline (multi-phase, delimiter stacks) | 🟩 | 🟨 | 🟨 | 🟥 |
| Multi-phase inline parsing (collect -> resolve -> emit) keeps the hot path linear and avoids backtracking. Mapping: ferromark uses multi-phase inline parsing; md4c and pulldown-cmark are optimized byte scanners; comrak does more AST bookkeeping. | ||||
| Emphasis matching efficiency | 🟩 | 🟨 | 🟨 | 🟥 |
| Efficient emphasis handling reduces rescans and backtracking. Stack-based algorithms tend to win on long text-heavy documents. Mapping: ferromark uses modulo-3 stacks; md4c and pulldown-cmark are optimized; comrak pays AST overhead. | ||||
| Link reference processing cost | 🟩 | 🟩 | 🟩 | 🟨 |
| Link labels need normalization (case folding and entity handling). Optimized implementations reduce allocations and Unicode overhead. Mapping: All four normalize labels; ferromark, md4c, and pulldown-cmark focus on minimizing allocations; comrak handles more feature paths. | ||||
| Zero-copy text handling | 🟩 | 🟨 | 🟨 | 🟥 |
| Zero-copy means most text slices point directly into input, which reduces allocations and copy costs. Mapping: ferromark uses ranges; md4c and pulldown-cmark borrow slices; comrak allocates AST nodes. | ||||
| Allocation pressure (hot path) | 🟩 | 🟩 | 🟨 | 🟥 |
| Fewer allocations in tight loops improves CPU utilization and reduces allocator overhead. Mapping: Streaming parsers allocate less during parse/render; AST parsers allocate many nodes. | ||||
| Output buffer reuse | 🟩 | 🟩 | 🟨 | 🟥 |
| Reusing output buffers avoids repeated allocations across runs and stabilizes performance. Mapping: ferromark, md4c, and pulldown-cmark allow reuse; comrak allocates internally. | ||||
| Memory locality (working set size) | 🟩 | 🟩 | 🟨 | 🟥 |
| A small working set fits in cache and reduces memory traffic. Mapping: Streaming parsers keep the working set small; AST-based parsing expands it. | ||||
| Cache friendliness | 🟩 | 🟩 | 🟨 | 🟥 |
| Linear scans and contiguous buffers are usually best for CPU caches. Mapping: ferromark and md4c favor linear scans; pulldown-cmark is close; comrak traverses AST allocations. | ||||
| SIMD availability (optional) | 🟩 | 🟨 | 🟩 | 🟥 |
| SIMD can accelerate scanning for special characters if the SIMD path is hot enough. Mapping: ferromark and pulldown-cmark have SIMD paths; md4c relies on C optimizations; comrak is not SIMD-focused. | ||||
| Hot-path control (bounds/branch minimization) | 🟩 | 🟩 | 🟧 | 🟥 |
| This row measures performance headroom from low-level control in inner loops. Mapping: md4c (C) and ferromark use tighter low-level tuning where beneficial; pulldown-cmark is mostly safe-Rust hot loops; comrak prioritizes higher-level flexibility. | ||||
| Dependency footprint | 🟩 | 🟩 | 🟨 | 🟥 |
| Fewer dependencies simplify builds and reduce binary bloat. Mapping: md4c and ferromark are minimal; pulldown-cmark is moderate; comrak is heavier. | ||||
| Throughput ceiling (architectural) | 🟩 | 🟩 | 🟨 | 🟥 |
| With fewer allocations and tighter hot loops, streaming architectures generally allow higher throughput ceilings. Mapping: ferromark and md4c lead here; pulldown-cmark is close; comrak trades throughput for flexibility. | ||||
| Core compactness (moving parts) | 🟨 | 🟩 | 🟨 | 🟧 |
| A compact core is easier to tune and reason about. Mapping: md4c is very compact; ferromark is lean; pulldown-cmark is moderate; comrak is larger by design. | ||||
| Feature Coverage and Extensibility | ||||
| Extension breadth (GFM and extras) | 🟩 | 🟧 | 🟨 | 🟩 |
| More extensions increase compatibility but add parsing work. Mapping: comrak offers the broadest extension catalog; ferromark implements all 5 GFM extensions (tables, strikethrough, task lists, autolink literals, disallowed raw HTML); pulldown-cmark supports common GFM features; md4c supports common GFM features. | ||||
| Spec compliance focus (CommonMark) | 🟩 | 🟩 | 🟨 | 🟩 |
| Full compliance adds edge-case handling. All four are strong here, but more features usually means more code on the hot path. Mapping: All four target CommonMark; comrak and md4c emphasize full compliance; pulldown-cmark adds extensions; ferromark is focused. | ||||
| Unicode handling configurability | 🟧 | 🟩 | 🟧 | 🟧 |
| Configurable Unicode handling can simplify hot paths or support special environments. Mapping: md4c can be built for UTF-8, UTF-16, or ASCII-only; the Rust parsers generally assume UTF-8. | ||||
| Portability | 🟨 | 🟩 | 🟨 | 🟨 |
| Portability matters for embedding and wide deployment. Mapping: md4c compiles almost anywhere with a C toolchain; the Rust crates are broadly portable too. | ||||
| Extension configuration surface | 🟨 | 🟩 | 🟨 | 🟨 |
Fine-grained flags let you disable features to reduce work. Mapping: md4c has many flags; pulldown-cmark and comrak use options; ferromark has 7 options covering all GFM extensions (allow_html, allow_link_refs, tables, strikethrough, task_lists, autolink_literals, disallowed_raw_html). | ||||
| Raw HTML control (allow/deny) | 🟩 | 🟩 | 🟧 | 🟩 |
Disabling raw HTML can simplify parsing and output. Mapping: md4c and comrak expose explicit switches; ferromark also exposes an explicit allow_html option; pulldown-cmark is more fixed in defaults. | ||||
| GFM Tables | 🟩 | 🟩 | 🟩 | 🟩 |
| GFM table syntax (header, delimiter, body rows with alignment). Mapping: All four parsers support GFM tables. | ||||
| Task lists, strikethrough | 🟩 | 🟨 | 🟨 | 🟩 |
| These GFM features are common in real-world Markdown. Mapping: All four parsers support task lists and strikethrough. | ||||
| Footnotes | 🟥 | 🟥 | 🟨 | 🟩 |
| Footnotes add extra parsing and rendering complexity. Mapping: pulldown-cmark and comrak support footnotes; ferromark and md4c do not focus on them. | ||||
| Math support | 🟥 | 🟩 | 🟥 | 🟩 |
| Math support often requires custom extensions. Mapping: md4c includes LaTeX math flags; comrak supports math extensions; ferromark and pulldown-cmark do not target math in the core. | ||||
| Permissive autolinks | 🟩 | 🟩 | 🟧 | 🟨 |
| Permissive autolinks trade strictness for convenience. Mapping: ferromark and md4c support GFM autolink literals (URL, www, email); comrak has relaxed autolinks; pulldown-cmark focuses on spec defaults. | ||||
| Wiki links | 🟥 | 🟩 | 🟥 | 🟩 |
| Wiki links are a non-CommonMark extension used in some ecosystems. Mapping: md4c and comrak support wiki links via flags/extensions; pulldown-cmark and ferromark do not. | ||||
| Underline extension | 🟥 | 🟩 | 🟥 | 🟩 |
| Underline is an extension that changes emphasis semantics. Mapping: md4c and comrak include underline extensions; pulldown-cmark and ferromark stick closer to CommonMark emphasis rules. | ||||
| Task list flexibility | 🟧 | 🟧 | 🟧 | 🟩 |
| Relaxed task list parsing can improve compatibility with messy inputs. Mapping: comrak offers relaxed task list options; ferromark, md4c, and pulldown-cmark support task lists with fewer knobs. | ||||
| Output safety toggles | 🟨 | 🟩 | 🟧 | 🟩 |
Safety toggles control whether raw HTML is emitted or escaped. Mapping: md4c and comrak provide explicit unsafe/escape switches; ferromark provides allow_html and disallowed_raw_html toggles; pulldown-cmark is more fixed in defaults. | ||||
| no_std viability | 🟥 | 🟨 | 🟩 | 🟥 |
| no_std support matters for embedded or constrained environments. Mapping: pulldown-cmark supports no_std builds with features; md4c can be embedded in C environments; ferromark and comrak assume std. | ||||
| Rendering and Output | ||||
| Output streaming (incremental) | 🟩 | 🟩 | 🟨 | 🟥 |
| Output streaming lets you write HTML incrementally, which lowers peak memory and removes extra passes. Mapping: ferromark and md4c stream to buffers or callbacks; pulldown-cmark streams events; comrak often renders after AST work. | ||||
| Output customization hooks | 🟧 | 🟩 | 🟨 | 🟩 |
| Callbacks and ASTs are great for custom rendering but add indirection compared to a single tight rendering loop. Mapping: md4c callbacks and comrak AST are very flexible; pulldown-cmark iterators are easy to transform; ferromark is lower level. | ||||
| Output formats | 🟥 | 🟧 | 🟨 | 🟩 |
| More output formats increase flexibility but add complexity. Mapping: comrak can emit HTML, XML, and CommonMark; pulldown-cmark provides HTML plus event streams; md4c has HTML renderer and callbacks; ferromark targets HTML. | ||||
| Source position support | 🟥 | 🟥 | 🟩 | 🟨 |
| Tracking source positions is useful for diagnostics and tooling, but adds overhead. Mapping: pulldown-cmark has strong source map support; comrak can emit source positions; ferromark and md4c are lighter. | ||||
| Source map tooling (API or CLI) | 🟥 | 🟥 | 🟩 | 🟨 |
| Source maps improve debuggability and tooling integration. Mapping: pulldown-cmark exposes event ranges; comrak can emit source position attributes; ferromark and md4c keep this minimal. | ||||
| IO friendliness (small writes) | 🟩 | 🟩 | 🟧 | 🟥 |
| Many small writes can be expensive without buffering. Mapping: md4c and ferromark stream into buffers or callbacks; pulldown-cmark recommends buffered output; comrak often builds strings after AST work. | ||||
CommonMark: 100% (652/652 tests)
All CommonMark spec tests pass (no filtering).
GFM: all 5 extensions implemented
Tables, strikethrough, task lists, autolink literals, and disallowed raw HTML.
use ferromark::to_html;
let html = ferromark::to_html("# Hello\n\n**World**");
assert!(html.contains("<h1>Hello</h1>"));
assert!(html.contains("<strong>World</strong>"));let mut buffer = Vec::new();
ferromark::to_html_into("# Reuse me", &mut buffer);
// buffer can be reused for next call# Development
cargo build
# Optimized release (recommended for benchmarks)
cargo build --release
# Run tests
cargo test
# Run CommonMark spec tests
cargo test --test commonmark_spec -- --nocapture
# Run benchmarks
cargo benchsrc/
├── lib.rs # Public API (to_html, to_html_into)
├── block/ # Block-level parser
│ ├── parser.rs # Line-oriented block parsing
│ └── event.rs # BlockEvent types
├── inline/ # Inline-level parser
│ ├── mod.rs # Three-phase inline parsing
│ ├── marks.rs # Mark collection
│ ├── code_span.rs
│ ├── emphasis.rs # Modulo-3 stack optimization
│ ├── strikethrough.rs # GFM strikethrough resolution
│ └── links.rs # Link/image/autolink parsing
├── cursor.rs # Pointer-based byte cursor
├── range.rs # Compact u32 range type
├── render.rs # HTML writer
├── escape.rs # HTML escaping (memchr-optimized)
└── limits.rs # DoS prevention constants
MIT OR Apache-2.0