Replace the Markdown parser and HTML renderer Misaka with Mistune #1040#1079
Replace the Markdown parser and HTML renderer Misaka with Mistune #1040#1079jelmer merged 37 commits intoisso-comments:masterfrom
Conversation
Unit tests need to be properly adapted still (and extended).
|
TODOs:
|
jelmer
left a comment
There was a problem hiding this comment.
Thanks! This looks reasonable to me overall. I think the main doubts are around automatically attempting to rewrite extensions.
In general, I see three options:
Option 1 is likely to make people unhappy that don't read release notes carefully and that is probably a significant number of users. I was originally thinking to go for option 2 but it is likely to create additional maintenance overhead in the long run and makes it more difficult to use mistune features. I prefer option 3 now but I would be happy to hear arguments for the other options. I will be traveling the next ten days and won't have time to do any substantial work until the second week of January. |
|
Thanks, I think that's
Thanks, I think that's a good summary. From my perspective, (2) isn't really an option. It creates more maintenance overhead (and confusion!) in the long term, and doesn't provide us with a way to ever get rid of that compatibility code. Even if we try, we won't completely match the behaviour of misaka so some people will experience breakage. (1) I believe could be okay if we manage the transition well. For example, we could consider bumping the major version of isso to signify major changes. I'm curious what you think (3) would look like in practice. I don't think you could prevent breaking existing installations without letting people select the backend (which from the project's perspective doesn't address #1040 even if it helps for *BSD users, and in fact makes maintenance harder in the long run). |
|
@jelmer, I am currently looking into what it would take to make a clean break and I am wondering what to do about the server config: [markup]
options = strikethrough, superscript, autolink, fenced-code
flags =
allowed-elements =
allowed-attributes =The above has the issue that [markup]
allowed-elements =
allowed-attributes =
[markup.mistune]
plugins = strikethrough, superscript, subscript
arguments = escape, hard_wrap |
|
That looks reasonable. It might make sense to support both mistune and misaka for a while (with sections named as you're suggesting), then warn people that misaka is going away and then eventually remove it. |
|
Why not, something like this? [markup]
renderer = misaka|mistune
allowed-elements =
allowed-attributes =
[markup.misaka]
options = strikethrough, superscript, autolink, fenced-code
flags =
[markup.mistune]
plugins = strikethrough, superscript, subscript
arguments = escape, hard_wrapThe |
And implement choice between Misaka and Mistune as rendering backend.
And separate Misaka-specific tests from generic tests.
And fix bugs in handling config.
|
DONE:
TODO:
I wrote earlier:
I am confused where that came from. Doesn't seem to make much sense or at least not as part of this change. |
And assert arguments in correct order. And fix expected rendered text.
And initialize Mistune with a similar set of plugins as Misaka. And always set escape to True because we cannot easily configure boolean False in lists. And add more documentation of differences between Misaka and Mistune.
|
Code is practically done, all tests under |
jelmer
left a comment
There was a problem hiding this comment.
This looks great. Some leftover minor comments, but perhaps this is ready to be switched from Draft to ready for review?
There are three situations I think are important to support well are:
- existing config, no renderer specified - should stay working both warn the user that no renderer is specified and that misaka is used an deprecated
- config that explicitly specifies misaka: should warn that misaka is deprecated
- config that explicitly specifies mistune: should just work :-)
Expose method has_option on ConfigParser so that we can test if an option was set instead of erroring out when it was not set.
I still need to resolve a few comments and need to check what Mistune plugins to enable by default to get closer to Misaka, and the documentation still needs a bit of fine-tuning. Also haven't actually run this for real yet. Would you prefer to get this merged and then create a second PR to finish the work? |
We try to keep master in a working state so we can do a release any time, so I'd prefer to wait until this is properly ready to merge it. I do feel that it's beyond the state of a Draft at this point though, since you're just polishing up the last few things - but whether the Draft bit is set is honestly a minor point :) |
|
Hey @ritzmann , anything I can do to help get this to a mergeable state? It would be great to get it in. |
|
No worries, I had hoped to get this wrapped up before my next holiday trip but too many other things to do. Now I am back from that trip, so hoping to drive this to a conclusion next week. |
Single quotes were not HTML-escaped in the website field due to escape() being called with quote=False. This allowed an attacker to break out of a single-quoted HTML attribute (e.g. href='...') and inject event handlers such as onmouseover. The same escaping was missing entirely from the user edit endpoint and the moderation edit endpoint. Fix by using escape(..., quote=True) for the website field and escape(..., quote=False) for author across all three write paths: POST /new, PUT /id/<id>, and POST /id/<id>/edit/<key>. Reported-by: ByamB4 <byamba4life@gmail.com>
# Conflicts: # CHANGES.rst
|
One minor item to be resolved still: And I still need to do more manual testing but otherwise this is ready from my point of view. |
Done |
Misaka has been abandoned several years ago. Misaka depends on Hoedown,
a Markdown parser implemented in C, which limits platform support.
Mistune provides a similar API and features, is a pure Python implementation and is actively
maintained.
The goal is to maintain backwards compatibility with the existing markdown options where
feasible and render HTML that closely resembles the implementation with Misaka.
Syntax differences
HTML elements
Misaka allows HTML expressions like
<em>Hi</em>in Markdown. This expressionwill be rendered to HTML as is:
<em>Hi</em>. Mistune takes a stricter approachand replaces any HTML element with rendered text like this:
<em>Hi</em>.Strong and emphasis
Misaka interprets asterisks like this:
Mistune is the same on the first two points but does not interpret asterisks inside words:
If you want to have the same rendered result with Mistune, use underscores for emphasis:
Fenced code
In Misaka, fenced code can be marked by enclosing the code block in three backticks
``` or by putting four space characters in front of each code line. Mistune
only understands three backticks.
Plugins
Fenced-code
Mistune always renders fenced code. The behavior is not configurable and no
fenced-code plugin exists for Mistune.
Superscript
This plugin exists in both engines with the same name
superscript. TheMarkdown syntax is different however.
Misaka:
^(superscripted_text)Mistune:
^superscripted_text^