fix: rendering of Markdown inline formatting and bullet lists (#1156)#1157
Conversation
✨ Highlights
🧾 Changes by Scope
🔝 Top Files
|
|
An automated preview of the documentation is available at https://1157.mrdocs.prtest2.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-02-13 11:16:11 UTC |
24dd8d2 to
4d78ef7
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1157 +/- ##
===========================================
+ Coverage 76.38% 76.47% +0.09%
===========================================
Files 311 311
Lines 29672 29749 +77
Branches 5863 5880 +17
===========================================
+ Hits 22664 22752 +88
+ Misses 4735 4728 -7
+ Partials 2273 2269 -4 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Yes. In theory, we could just bypass clang and parse everything ourselves, and this would be somewhat equivalent to a hook. The biggest issue blocking this right now is human resources: the current plan is to address #1113 first (although we're opening this exception for capy issues). We'll have reasonable solutions for the parsing problem once we get it. 😊 |
|
Oh... Regarding your solution, it looks a little verbose (and potentially duplicated), to be honest. 😅 We already parse Markdown elsewhere in the finalize step; this is completely unrelated to the clang parser. The algorithm takes an array of text nodes and converts them inline into nodes with the proper tags (there are even unit tests for that). We just need to complete the plumbing so that the list items also work with this functionality. It shouldn't be much code. The impression I have is that whatever procedure you used to figure out what to change wasn't aware that the logic already exists in the codebase. And that's understandable, because this codebase has become extremely large and complex, and it's hard to understand the context in which each thing happens. 🙃 But we can come up with a reasonable solution to this problem. It's not that bad. Oh... and we also need tests, of course. 🙂 |
140c3ad to
de0c61a
Compare
| bool | ||
| /** Return true because Optional<T&> never allocates storage. | ||
|
|
||
| @return `true` always. |
There was a problem hiding this comment.
Not your fault but... weird position for this doc comment.
There was a problem hiding this comment.
Yeah, several functions in Optional.hpp have the doc comment in that position.
| for (auto& I : infos) | ||
| { | ||
| MRDOCS_CHECK_OR_CONTINUE(I.doc); | ||
| convertMarkdownLists(*I.doc); |
There was a problem hiding this comment.
This creates a new Markdown rendering pass instead of integrating with the existing finalizer Markdown rendering pass (IIRC, parseInlines, right?). Doing this is problematic, duplicates logic, is less efficient, and hard to maintain. Especially if it comes before the first pass, because we're trying to parse internal Markdown nodes before external ones.
There was a problem hiding this comment.
Hmm... I moved the call to convertMarkdownListsInBlocks() into parseInlines(), so there's a single Markdown processing step now. The list conversion runs first within that function because it's a block-level restructure (splitting paragraphs into ListBlock nodes) that needs line boundaries intact, while parse() is a pure inline parser that operates on text strings. I think extending parse() to handle lists wouldn't work since it has no access to block-level structure. Does that make sense?
There was a problem hiding this comment.
Ideally, parseInlines would use an algorithm similar to the one in the CommonMark specification: first look for blocks, then for inlines and other blocks inside blocks, then inlines inside inlines, etc. It just becomes recursion after the first level of blocks.
The only difference is we have this mess: we don't have string_view as input because another algorithm already tried to parse it, so we have to keep interpreting sequences of nodes as a single string, and the code becomes spaghetti. 🍝🤣 The spaguetti also makes the first level confusing, because another algorithm already parsed it, so we're already operating at the block level, because the other algorithm does find blocks.
Anyhow, you're correct to assume lists have higher precedence because this is a block level. But lists are special nodes in CommonMark. It seems a little confusing at first. A list is a block level container, like other block level containers so they go in the higher level of the algorithm. But they contain items instead of inlines (it's a variation but this kind of variation is not so uncommon). But then each list item is also a block container (rather than a container of inlines) and this is the big variation. So when there's only text in the list item, that's actually a paragraph container in the list item.
For instance, this is a list with one list item with 4 blocks:
- This is a paragraph.
This is another paragraph.
> A block quote
- A nested list
And this is a list with one list item with 1 block (a paragraph).
- Just one line
The algorithm is derived naturally from that:
Document
└── List (block)
├── ListItem (block container)
│ └── Paragraph (block)
└── ListItem (block container)
├── Paragraph (block)
└── List (block)
As usual, what makes our case complex is that we're actually (re)parsing nodes rather than strings, which is much more complex. At some point, we should probably just store the strings in the first step and let the post-processing step do all of the work. Because the clang parsing step is parsing only low hanging fruits (blocks that are @ref and some text) while making the post-processing step much harder to reparse, which is the step that actually understands ~80% of the features we want and have open issues about. (@mizvekov this is once again related to the topic we were discussing this week)
There was a problem hiding this comment.
Thanks for the context — that all makes sense. Just to be clear: are you happy to merge this as-is (with the understanding that the long-term fix is moving to raw strings in the finalizer), or would you rather hold off and address the Capy list rendering as part of that larger rework?
There was a problem hiding this comment.
Not really. Perhaps I tried to provide too much detail about these layers work and didn't make myself clear. 😅
To be short and clear: The design I described (container of blocks -> containers of inlines/blocks) is not only valid for raw strings. It is also valid for reparsing. And it is also the model that the code is currently attempting to implement in this 2nd parsing layer.
Thus, to be clear, I would not like this pattern to be broken by an implementation that bypasses it and introduces an additional 3rd parsing layer. That's what the previous version of the code did. Or I would not like any new code "pretending" to integrate with the 2nd layer, like a hypothetical function that calls a single function to do all the markdownListParsing from somewhere, even if that somewhere is inside the function that triggers the 2nd layer parsing, of course. 🙃
And we don't want that because if we want to merge these two layers into a single layer soon, we do NOT want to create a 3rd layer now. Also, because duplicating code is always a bad idea. 😁
are you happy to merge this as-is
I don't know. I haven't read the new version of the code yet. I was hoping you could let me know whether the new code meets these criteria we've discussed and how it does it before I review it. This way, there's no need to put effort into reversing engineering code if you can just tell me how you integrated the layers. 😁
the long-term fix is moving to raw strings in the finalizer
Yes. That would be great. But it's mostly unrelated to the issue described in this review thread. It's just something I mentioned in passing, so we have it in mind when integrating components into the second layer. It's not blocking or blocked by this issue.
…iance#1156) Markdown inline formatting (**bold**, `code`), bullet lists using "- " markers, and nested lists, were not rendered correctly.
This was a pre-existing bug that affected all content inside list items (not just our new Markdown lists, but also existing `@li` ones).
Exercise empty-paragraph, admonition, and styled-continuation branches in Markdown list conversion to meet the 90% patch coverage target.
This bug was introduced with commit #51e2b655af43f36bc2fd3e9c369dbd48046d2de6.
de0c61a to
8463349
Compare
@alandefreitas: This patch covers the current Capy needs but, as I said on Slack, it is ad-hoc. The only long term solution is what Vinnie suggested: a Clang hook to take over comment processing entirely. A bit reluctantly, given the time I spent on it, I'd say we shouldn't merge it. We can extract the fix to inline.html.hbs from it, though, which is a genuine bug fix and doesn't depend on the Clang hook.
Markdown inline formatting (bold, italic,
code) and bullet lists using "- " markers were not rendered in the HTML output.Inline formatting:
Add
appendMarkdownInlines()to parse bold, italic, andcodespans in text nodes, producingStrongInline,EmphInline, andCodeInlinenodes respectively.Call it from
visitText()so all text nodes are processed.Add a markup/b.html.hbs partial to render
<strong>tags.Add doc/inline.html.hbs to dispatch text, strong, emph, and code inline kinds to their partials without emitting extraneous blank lines.
List detection and conversion:
Add list marker detection functions that recognize "- " at the start of text, after newlines, after ":" or "." punctuation, and at trailing positions within already-started lists.
Add
convertParagraphWithLists()to split a paragraph containing list markers into a prefix paragraph and aListBlock.Apply list detection in
visitParagraph(),@parblocks, and@liblocks so that Markdown lists are converted in all contexts.