Skip to content

fix: rendering of Markdown inline formatting and bullet lists (#1156)#1157

Open
gennaroprota wants to merge 5 commits intocppalliance:developfrom
gennaroprota:fix/rendering_of_markdown_inline_formatting_and_bullet_lists
Open

fix: rendering of Markdown inline formatting and bullet lists (#1156)#1157
gennaroprota wants to merge 5 commits intocppalliance:developfrom
gennaroprota:fix/rendering_of_markdown_inline_formatting_and_bullet_lists

Conversation

@gennaroprota
Copy link
Collaborator

@gennaroprota gennaroprota commented Feb 6, 2026

@alandefreitas: This patch covers the current Capy needs but, as I said on Slack, it is ad-hoc. The only long term solution is what Vinnie suggested: a Clang hook to take over comment processing entirely. A bit reluctantly, given the time I spent on it, I'd say we shouldn't merge it. We can extract the fix to inline.html.hbs from it, though, which is a genuine bug fix and doesn't depend on the Clang hook.


Markdown inline formatting (bold, italic, code) and bullet lists using "- " markers were not rendered in the HTML output.

Inline formatting:

  • Add appendMarkdownInlines() to parse bold, italic, and code spans in text nodes, producing StrongInline, EmphInline, and CodeInline nodes respectively.

  • Call it from visitText() so all text nodes are processed.

  • Add a markup/b.html.hbs partial to render <strong> tags.

  • Add doc/inline.html.hbs to dispatch text, strong, emph, and code inline kinds to their partials without emitting extraneous blank lines.

List detection and conversion:

  • Add list marker detection functions that recognize "- " at the start of text, after newlines, after ":" or "." punctuation, and at trailing positions within already-started lists.

  • Add convertParagraphWithLists() to split a paragraph containing list markers into a prefix paragraph and a ListBlock.

  • Apply list detection in visitParagraph(), @par blocks, and @li blocks so that Markdown lists are converted in all contexts.

@github-actions
Copy link

github-actions bot commented Feb 6, 2026

🚧 Danger.js checks for MrDocs are experimental; expect some rough edges while we tune the rules.

✨ Highlights

  • 🧪 Existing golden tests changed (behavior likely shifted)

🧾 Changes by Scope

Scope Lines Δ Lines + Lines - Files Δ Files + Files ~ Files ↔ Files -
🥇 Golden Tests 760 746 14 8 4 4 - -
🛠️ Source 267 252 15 9 2 7 - -
🏗️ Build / Toolchain 2 1 1 1 - 1 - -
Total 1029 999 30 18 6 12 - -

Legend: Files + (added), Files ~ (modified), Files ↔ (renamed), Files - (removed)

🔝 Top Files

  • test-files/golden-tests/javadoc/markdown-list/markdown-list.html (Golden Tests): 281 lines Δ (+281 / -0)
  • test-files/golden-tests/javadoc/markdown-list/markdown-list.adoc (Golden Tests): 215 lines Δ (+215 / -0)
  • src/lib/Metadata/Finalizers/DocCommentFinalizer.cpp (Source): 198 lines Δ (+192 / -6)

Generated by 🚫 dangerJS against 8463349

@gennaroprota gennaroprota marked this pull request as draft February 6, 2026 15:12
@cppalliance-bot
Copy link

cppalliance-bot commented Feb 6, 2026

An automated preview of the documentation is available at https://1157.mrdocs.prtest2.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-02-13 11:16:11 UTC

@gennaroprota gennaroprota force-pushed the fix/rendering_of_markdown_inline_formatting_and_bullet_lists branch 2 times, most recently from 24dd8d2 to 4d78ef7 Compare February 6, 2026 17:16
@codecov
Copy link

codecov bot commented Feb 6, 2026

Codecov Report

❌ Patch coverage is 95.06173% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.47%. Comparing base (5426a0a) to head (8463349).

Files with missing lines Patch % Lines
...rc/lib/Metadata/Finalizers/DocCommentFinalizer.cpp 94.52% 2 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1157      +/-   ##
===========================================
+ Coverage    76.38%   76.47%   +0.09%     
===========================================
  Files          311      311              
  Lines        29672    29749      +77     
  Branches      5863     5880      +17     
===========================================
+ Hits         22664    22752      +88     
+ Misses        4735     4728       -7     
+ Partials      2273     2269       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@gennaroprota gennaroprota marked this pull request as ready for review February 6, 2026 18:25
@alandefreitas
Copy link
Collaborator

The only long-term solution is what Vinnie suggested: a Clang hook to take over comment processing entirely

Yes. In theory, we could just bypass clang and parse everything ourselves, and this would be somewhat equivalent to a hook. The biggest issue blocking this right now is human resources: the current plan is to address #1113 first (although we're opening this exception for capy issues). We'll have reasonable solutions for the parsing problem once we get it. 😊

@alandefreitas
Copy link
Collaborator

Oh... Regarding your solution, it looks a little verbose (and potentially duplicated), to be honest. 😅

We already parse Markdown elsewhere in the finalize step; this is completely unrelated to the clang parser. The algorithm takes an array of text nodes and converts them inline into nodes with the proper tags (there are even unit tests for that). We just need to complete the plumbing so that the list items also work with this functionality. It shouldn't be much code.

The impression I have is that whatever procedure you used to figure out what to change wasn't aware that the logic already exists in the codebase. And that's understandable, because this codebase has become extremely large and complex, and it's hard to understand the context in which each thing happens. 🙃

But we can come up with a reasonable solution to this problem. It's not that bad. Oh... and we also need tests, of course. 🙂

@gennaroprota gennaroprota force-pushed the fix/rendering_of_markdown_inline_formatting_and_bullet_lists branch 7 times, most recently from 140c3ad to de0c61a Compare February 12, 2026 14:38
Copy link
Collaborator

@alandefreitas alandefreitas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great

bool
/** Return true because Optional<T&> never allocates storage.

@return `true` always.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not your fault but... weird position for this doc comment.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, several functions in Optional.hpp have the doc comment in that position.

for (auto& I : infos)
{
MRDOCS_CHECK_OR_CONTINUE(I.doc);
convertMarkdownLists(*I.doc);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This creates a new Markdown rendering pass instead of integrating with the existing finalizer Markdown rendering pass (IIRC, parseInlines, right?). Doing this is problematic, duplicates logic, is less efficient, and hard to maintain. Especially if it comes before the first pass, because we're trying to parse internal Markdown nodes before external ones.

Copy link
Collaborator Author

@gennaroprota gennaroprota Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... I moved the call to convertMarkdownListsInBlocks() into parseInlines(), so there's a single Markdown processing step now. The list conversion runs first within that function because it's a block-level restructure (splitting paragraphs into ListBlock nodes) that needs line boundaries intact, while parse() is a pure inline parser that operates on text strings. I think extending parse() to handle lists wouldn't work since it has no access to block-level structure. Does that make sense?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, parseInlines would use an algorithm similar to the one in the CommonMark specification: first look for blocks, then for inlines and other blocks inside blocks, then inlines inside inlines, etc. It just becomes recursion after the first level of blocks.

The only difference is we have this mess: we don't have string_view as input because another algorithm already tried to parse it, so we have to keep interpreting sequences of nodes as a single string, and the code becomes spaghetti. 🍝🤣 The spaguetti also makes the first level confusing, because another algorithm already parsed it, so we're already operating at the block level, because the other algorithm does find blocks.

Anyhow, you're correct to assume lists have higher precedence because this is a block level. But lists are special nodes in CommonMark. It seems a little confusing at first. A list is a block level container, like other block level containers so they go in the higher level of the algorithm. But they contain items instead of inlines (it's a variation but this kind of variation is not so uncommon). But then each list item is also a block container (rather than a container of inlines) and this is the big variation. So when there's only text in the list item, that's actually a paragraph container in the list item.

For instance, this is a list with one list item with 4 blocks:

- This is a paragraph.

  This is another paragraph.

  > A block quote

  - A nested list

And this is a list with one list item with 1 block (a paragraph).

- Just one line

The algorithm is derived naturally from that:

Document
└── List (block)
    ├── ListItem (block container)
    │   └── Paragraph (block)
    └── ListItem (block container)
        ├── Paragraph (block)
        └── List (block)

As usual, what makes our case complex is that we're actually (re)parsing nodes rather than strings, which is much more complex. At some point, we should probably just store the strings in the first step and let the post-processing step do all of the work. Because the clang parsing step is parsing only low hanging fruits (blocks that are @ref and some text) while making the post-processing step much harder to reparse, which is the step that actually understands ~80% of the features we want and have open issues about. (@mizvekov this is once again related to the topic we were discussing this week)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context — that all makes sense. Just to be clear: are you happy to merge this as-is (with the understanding that the long-term fix is moving to raw strings in the finalizer), or would you rather hold off and address the Capy list rendering as part of that larger rework?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. Perhaps I tried to provide too much detail about these layers work and didn't make myself clear. 😅

To be short and clear: The design I described (container of blocks -> containers of inlines/blocks) is not only valid for raw strings. It is also valid for reparsing. And it is also the model that the code is currently attempting to implement in this 2nd parsing layer.

Thus, to be clear, I would not like this pattern to be broken by an implementation that bypasses it and introduces an additional 3rd parsing layer. That's what the previous version of the code did. Or I would not like any new code "pretending" to integrate with the 2nd layer, like a hypothetical function that calls a single function to do all the markdownListParsing from somewhere, even if that somewhere is inside the function that triggers the 2nd layer parsing, of course. 🙃

And we don't want that because if we want to merge these two layers into a single layer soon, we do NOT want to create a 3rd layer now. Also, because duplicating code is always a bad idea. 😁

are you happy to merge this as-is

I don't know. I haven't read the new version of the code yet. I was hoping you could let me know whether the new code meets these criteria we've discussed and how it does it before I review it. This way, there's no need to put effort into reversing engineering code if you can just tell me how you integrated the layers. 😁

the long-term fix is moving to raw strings in the finalizer

Yes. That would be great. But it's mostly unrelated to the issue described in this review thread. It's just something I mentioned in passing, so we have it in mind when integrating components into the second layer. It's not blocking or blocked by this issue.

…iance#1156)

Markdown inline formatting (**bold**, `code`), bullet lists using "- "
markers, and nested lists, were not rendered correctly.
This was a pre-existing bug that affected all content inside list items
(not just our new Markdown lists, but also existing `@li` ones).
Exercise empty-paragraph, admonition, and styled-continuation branches
in Markdown list conversion to meet the 90% patch coverage target.
This bug was introduced with commit #51e2b655af43f36bc2fd3e9c369dbd48046d2de6.
@gennaroprota gennaroprota force-pushed the fix/rendering_of_markdown_inline_formatting_and_bullet_lists branch from de0c61a to 8463349 Compare February 13, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants