Skip to content

Commit e466f38

Browse files
authored
Fix handling of bogus comments.
As with most implementations, we now pass through bogus comments (as defined by the HTML Spec) unaltered except that they are HTML escaped. This deviates from the reference implementation which completely ignores them. As the reference implementation seems to not have even contemplated their existence, it is not being used as a reference in this instance. Fixes #1425.
1 parent a2a9c53 commit e466f38

File tree

3 files changed

+18
-8
lines changed

3 files changed

+18
-8
lines changed

docs/changelog.md

+1
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
2020
* Fix edge-case crash in `InlineProcessor` with `AtomicString` (#1406).
2121
* Fix edge-case crash in `codehilite` with an empty `code` tag (#1405).
2222
* Improve and expand type annotations in the code base (#1401).
23+
* Fix handling of bogus comments (#1425).
2324

2425
## [3.5.1] -- 2023-10-31
2526

markdown/htmlparser.py

+9
Original file line numberDiff line numberDiff line change
@@ -277,6 +277,15 @@ def parse_html_declaration(self, i: int) -> int:
277277
self.handle_data('<!')
278278
return i + 2
279279

280+
def parse_bogus_comment(self, i: int, report: int = 0) -> int:
281+
# Override the default behavior so that bogus comments get passed
282+
# through unaltered by setting `report` to `0` (see #1425).
283+
pos = super().parse_bogus_comment(i, report)
284+
if pos == -1: # pragma: no cover
285+
return -1
286+
self.handle_empty_tag(self.rawdata[i:pos], is_block=False)
287+
return pos
288+
280289
# The rest has been copied from base class in standard lib to address #1036.
281290
# As `__startag_text` is private, all references to it must be in this subclass.
282291
# The last few lines of `parse_starttag` are reversed so that `handle_starttag`

tests/test_syntax/blocks/test_html_blocks.py

+8-8
Original file line numberDiff line numberDiff line change
@@ -782,16 +782,16 @@ def test_raw_comment_trailing_whitespace(self):
782782
'<!-- *foo* -->'
783783
)
784784

785-
# Note: this is a change in behavior for Python-Markdown, which does *not* match the reference
786-
# implementation. However, it does match the HTML5 spec. Declarations must start with either
787-
# `<!DOCTYPE` or `<![`. Anything else that starts with `<!` is a comment. According to the
788-
# HTML5 spec, a comment without the hyphens is a "bogus comment", but a comment nonetheless.
789-
# See https://www.w3.org/TR/html52/syntax.html#markup-declaration-open-state.
790-
# If we wanted to change this behavior, we could override `HTMLParser.parse_bogus_comment()`.
791785
def test_bogus_comment(self):
792786
self.assertMarkdownRenders(
793-
'<!*foo*>',
794-
'<!--*foo*-->'
787+
'<!invalid>',
788+
'<p>&lt;!invalid&gt;</p>'
789+
)
790+
791+
def test_bogus_comment_endtag(self):
792+
self.assertMarkdownRenders(
793+
'</#invalid>',
794+
'<p>&lt;/#invalid&gt;</p>'
795795
)
796796

797797
def test_raw_multiline_comment(self):

0 commit comments

Comments
 (0)