Fix handling of indented HTML in DOMParser #315

mattmikolay · 2024-08-15T13:07:12Z

This PR fixes a bug that causes DOMParser to throw an error when parsing indented HTML with the default collapsing of whitespace.

The root cause of the bug is in add_text_node. Notice the the assignment of node_before on line 571:

prosemirror-py/prosemirror/model/from_dom.py

Line 571 in 34c9614

node_before = top.content[-1]

When top.content is an empty list, the following error is thrown:

IndexError: list index out of range

Compare with the JavaScript implementation of addTextNode in the original ProseMirror:

 let nodeBefore = top.content[top.content.length - 1]

Here, when top.content is an empty array, nodeBefore will be set to top.content[-1]. In JavaScript, a negative indexing of an empty array will correctly return undefined. However, in Python, a negative indexing of an empty list will throw an error. Thus, we need to check if top.content is truthy before accessing top.content[-1].

Thanks for the great library!

p7g

Thanks for the contribution!

mattmikolay added 2 commits June 10, 2024 10:39

Update add_text_node in from_dom.py

bfac67b

Add UT

8d4bd0f

mattmikolay marked this pull request as ready for review August 15, 2024 13:12

p7g approved these changes Aug 20, 2024

View reviewed changes

p7g merged commit 4754bf6 into fellowapp:main Aug 20, 2024
3 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling of indented HTML in DOMParser #315

Fix handling of indented HTML in DOMParser #315

mattmikolay commented Aug 15, 2024 •

edited

Loading

p7g left a comment

Fix handling of indented HTML in DOMParser #315

Fix handling of indented HTML in DOMParser #315

Conversation

mattmikolay commented Aug 15, 2024 • edited Loading

p7g left a comment

Choose a reason for hiding this comment

mattmikolay commented Aug 15, 2024 •

edited

Loading