Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

divs/spans break WYSIWYG in FireFox #406

Open
btbonval opened this issue Feb 21, 2015 · 27 comments
Open

divs/spans break WYSIWYG in FireFox #406

btbonval opened this issue Feb 21, 2015 · 27 comments
Labels

Comments

@btbonval
Copy link
Member

Particular test case: /note/harvard/reason-and-faith-in-the-west/resumepdf-5-27-43979

That note allows me to open the editor, but it's all raw HTML in the editor. If I try to embolden some plain text, it just removes the editor window, refreshes the page, and nothing changed.

No errors on console besides TypeError: m.getComputedStyle(...) is null ... wysihtml5x-toolbar.min.js:5, but this error is relatively common on pages that work just fine. No errors on the server console.

This problem was originally noticed in #402 but I conflated a few different problems together in that ticket. I thought they might have all been resolved, but it appears not to be the case.

@btbonval
Copy link
Member Author

Looking at the raw code for that link, it's a mess of empty divs and spans that are arbitrarily scattered throughout the text.

@btbonval
Copy link
Member Author

Found another example in the database which has divs and cannot be edited. /note/Brown/management-of-industrial-and-nonprofit-organizations/finalsclub-hccg-reportpdf-5-20-137857

=# SELECT slug FROM notes_notemarkdown INNER JOIN notes_note ON notes_notemarkdown.note_id = notes_note.id WHERE notes_note.category = 'LECTURE_NOTES' AND notes_notemarkdown.html LIKE '%<div>%';
                 slug
---------------------------------------
 resumepdf-5-24-675181
 finalsclub-hccg-reportpdf-5-20-137857
 resumepdf-5-17-387450
 resumepdf-5-24-233477
 resumepdf-5-27-43979
(5 rows)

this might be unrelated to pdf2htmlex, since I don't see any particular markers about /* blahblah pdf2htmlEX */ which I see in some other HTML in the database.

@btbonval btbonval changed the title pdf2htmlex output is not editable some notes are not editable Feb 23, 2015
@btbonval
Copy link
Member Author

The upstream static file is full of random divs and spans. https://karma-beta.s3.amazonaws.com/html/finalsclub-hccg-reportpdf-5-20-137857.html

Given the new sanitizer and upload process, it will be worth trying to upload the PDF as a new note and see if the problem persists. If not, it might be good to download the upstream PDFs and run them through the usual upload process if that is feasible.

If that involves pushing to gdrive and pulling from gdrive and running through pdf2htmlex and sanitizing and sanitizing, that might not be worthwhile as a data migration.

At that point, it might be worthwhile to post process notes that contain span/div in the database after the data migration is completed.

@btbonval
Copy link
Member Author

queue appears to have finished, but I can't upload any notes. Nothing seems to be happening with celery worker or beat when I upload a new note.

@btbonval
Copy link
Member Author

alright, closed a bunch of my zombie AMQP connections and added the BROKER_POOL_LIMIT=0 as per #392 in my local environment and on the staging system (although beta's change will not take effect until it is restarted). That seems to have helped, some text notes I uploaded got processed finally. Still nothing from the PDF uploads.

@btbonval
Copy link
Member Author

ah! but the plain text file that I just uploaded has divs and spans in it! Absolutely confirmed this is not due to pdf2htmlEX.

@btbonval
Copy link
Member Author

Yeah, this uploaded text file with spans cannot be edited either. The WYSIWYG seems non functional, showing straight HTML.

It might be time to follow the work flow and see where those span tags came in.

@btbonval
Copy link
Member Author

I can delete content in the straight HTML editor mode. I deleted all spans, but left <p>blah</p> in place. I saved the file and edited it again. The <p>s were no longer explicitly visible, but were rendered as they are supposed to be. I was able to edit the contents of the file and add bolding.

So the WYSIWYG is back to functional once I remove the <span> tags.

@btbonval
Copy link
Member Author

I added the <span>...</span> tag around something and saved it. It escaped the span with &lt; and &gt;, which is good and expected.

I went to edit the note again, and the editor is back to plain/text mode, no WYSIWYG, and the buttons do not work.

@btbonval
Copy link
Member Author

Removed <span>...</span> and editor is back to WYSIWYG.

Added <a href="place">text</a>;. Again, the tags are escaped. Went back to WYSIWYG. It's still WYSI, but <a href=" and </a> are gone from the WYG, while place">text; remains. Saving it, those escaped tags are totally gone.

This editor is not idempotent or else we are interfering with the results somehow.

@btbonval
Copy link
Member Author

It's google docs that's adding a ton of spans and divs everywhere.

pdb in convert_raw_document (secret: run celery worker with --pool solo) let me interrogate some of the details.

[2015-02-24 01:58:38,277: WARNING/MainProcess] (Pdb) content_dict['html']
[2015-02-24 01:58:42,963: WARNING/MainProcess] '<html><head><title>PS3EYE4.txt</title><meta content="text/html; charset=UTF-8" http-equiv="content-type"><style type="text/css">ol{margin:0;padding:0}.c0{vertical-align:baseline;color:#000000;
...
h5{widows:2;padding-top:11pt;line-height:1.15;orphans:2;text-align:left;color:#000000;font-size:11pt;font-family:"Arial";font-weight:bold;padding-bottom:2pt;page-break-after:avoid}h6{widows:2;padding-top:10pt;line-height:1.15;orphans:2;text-align:left;color:#000000;font-size:10pt;font-family:"Arial";font-weight:bold;padding-bottom:2pt;page-break-after:avoid}</style></head><body class="c3"><p class="c1"><span class="c0">00: 640x480@15</span></p><p class="c1"><span class="c0">01: 640x480@30</span></p><p class="c1"><span class="c0">02: 640x480@40</span></p><p class="c1"><span class="c0">03: 640x480@50</span></p><p class="c1"><span class="c0">04: 640x480@60</span></p><p class="c1"><span class="c0">10: 320x240@30</span></p><p class="c1"><span class="c0">11: 320x240@40</span></p><p class="c1"><span class="c0">12: 320x240@50</span></p><p class="c1"><span class="c0">13: 320x240@60</span></p><p class="c1"><span class="c0">14: 320x240@75</span></p><p class="c1"><span class="c0">15: 320x240@100 (added 15/02/09 V0.3)</span></p><p class="c1"><span class="c0">16: 320x240@125 (added 15/02/09 V0.3)</span></p><p class="c1 c2"><span class="c0"></span></p><p class="c1"><span class="c0">sudo modprobe gspca_ov534 videomode=04</span></p><p class="c1 c2"><span class="c0"></span></p></body></html>'

This is just a simple text file, not even RTF, but Google Docs (which yields the content_dict above) is inserting a span inside each p.

@btbonval
Copy link
Member Author

Alright I need to let this cook on the back burner.

Do we want to strip divs and spans? Will that mess everything up?

I swear I've searched the junk out of the WYSIHTML issues and found nothing about problems with span and div, in fact, spans are used for color according to one issue. Perhaps this is an upstream issue, or an integration issue?

@btbonval
Copy link
Member Author

Also the <TITLE>title</TITLE> is getting concatenated on the front of the document. I'm pretty sure this is a direct result of the sanitizer stripping all tags around the title, but leaving the title text in place.

I kind of like this happy accident, but it should be wrapped with header formatting or something.

@btbonval btbonval changed the title some notes are not editable Google conversion adds spurious divs/spans which break WYSIWYG Feb 24, 2015
@btbonval
Copy link
Member Author

Interestingly, WYSIHTML notes in its Readme that it supports

Graceful Degradation: Users with other browsers will see the textarea and are still able to write plain HTML by themselves.

This sounds almost exactly like the situation when a span or div is present, although that should have nothing to do with browser version.

@btbonval
Copy link
Member Author

Toying with the example found at http://www.wysihtml.com/ . There's a button to switch to raw HTML. When using colors, they have spans with properties. I decided to remove properties from all of the spans so it was <span>text</span>. Switching back to WYSI mode, all the color was removed but everything was as it should be. I was able to add bolding without any issue.

Somehow our implementation of WYSIHTML is flawed, because it cannot handle spans in this way.

@btbonval
Copy link
Member Author

There are different rulesets. This might somehow be a difference. We're using /parser_rules/advanced_and_extended.js.

It isn't clear what the front page http://www.wysihtml.com/ example is using.

http://voog.github.io/wysihtml/examples/simple.html is clearly using advanced_and_extended.js (from looking at network traffic). Switching to HTML view, I'm able to add <span>text</span> and switch back to WYSI mode, no problems.

I notice it says

Uses a custom rule set that allows the following elements: strong, b, em, i, a, span

I'm curious if that refers to the parser_rules (which are provided with the repo, so I don't imagine they are custom), or if there is some kind of switch that controls acceptable elements.

There's a span definition in the parser we're using.
https://github.com/FinalsClub/karmaworld/blob/note-editing/karmaworld/apps/wysihtml5/static/wysihtml5/wysihtml-0.4.17/parser_rules/advanced_and_extended.js#L429-L443

I don't see anything wrong there.

@btbonval
Copy link
Member Author

I injected a WYSI/HTML toggle button. I cannot use it to to HTML, as that action triggers a whole page refresh.

This is oddly familiar from when I see HTML but try to use buttons. Normally, in WYSIHTML's proper HTML view, the buttons are disabled. When I see the accidental HTML view due to a span, the buttons are enabled but using them causes a refresh (as if, perhaps, it is trying to switch from HTML to WYSI mode?).

Might be a red herring, but I'm getting the strong impression that this entire page reload thing is interfering with my troubleshooting.

@btbonval
Copy link
Member Author

When I click Save, I get a 302 POST back to the page in the network traffic. That's expected of a submit button. The POST causes a page refresh since the URL has technically changed.

When I'm seeing HTML view, highlight something, and click B to embolden it, sure enough I see 302 POST back to the page in the network traffic. That's the cause of the page refresh. Why that causes a POST is ... what I have to deduce.

Then again it might be going down the way wrong path. The spans shouldn't cause HTML view in the first place.

@btbonval
Copy link
Member Author

I noticed that textareaClone is undefined here when HTML view happens and the WYSI is broken:
https://github.com/FinalsClub/karmaworld/blob/note-editing/karmaworld/templates/notes/note_base.html#L81

Unfortunately it seems to be undefined when WYSI is working. ugh. grasping at straws. Although it does beg the question why have it?

@btbonval
Copy link
Member Author

I'm getting very odd behavior trying to drop breakpoints in the note_base code. Some breakpoints drop down a few lines when I click them. Any breakpoint added prior to line 81 highlighted above doesn't stop execution at all.

I think I'm going to need to switch to Chrome to poke at this better.

@btbonval
Copy link
Member Author

Google Chrome 40.0.2214.111 on Mint Linux lets me debug all the points the way I would expect.

Yet, the same version does not break on spans. The WYSIWYG is working just fine in Chrome! Kind of. So here's the weird thing: <span> as typed into the WYSIWYG is escaped in the actual note display, but is treated as actual HTML in the editor when going back into it. Going back and forth enough times, the spans disappeared entirely.

FireFox 35.0.1 on Mint Linux does not show the WYSIWYG properly when spans are present. Yet the main website works fine on the same FireFox browser.

@btbonval btbonval changed the title Google conversion adds spurious divs/spans which break WYSIWYG divs/spans which break WYSIWYG in FireFox Feb 25, 2015
@btbonval btbonval changed the title divs/spans which break WYSIWYG in FireFox divs/spans break WYSIWYG in FireFox Feb 25, 2015
@btbonval
Copy link
Member Author

Okay, strange interaction.

In Google, I enclose text with <span> in the WYSI. I reload the page in FireFox, where the spans are visible as escaped in the note. I edit the note, editor works just fine and the span is completely ignored and missing from the WYSI (as desired)! I save the note without changes in FireFox, and the escaped <span> is gone from the note view. Reload the page and click edit, and the <span> is gone from the editor as well. Totally gone.

In FireFox, I enclose text with <span> in the WYSI. I reload the page in FireFox, where the spans are visible as escaped in the note. I edit the note, editor shows HTML and is broken. I save the note (even though it is showing HTML and stuff), and the escaped <span> is gone from the note view. Reload the page and click the editor again, but the <span> is still there breaking the WYSI.

In Chrome, this whole process works start to finish.

The last of the quadfecta:

In FireFox, I enclose text with in the WYSI. I reload the page in Chrome, where the spans are visible as escaped in the note. I edit the note, editor works just fine and the span is completely ignored and missing from the WYSI (as desired)! I save the note without changes in Chrome, and the escaped is gone from the note view. Reload the page and click the editor again, but the is still there breaking the WYSI.

How can entering <span> in one browser or the other make such a huge difference?!

@btbonval
Copy link
Member Author

The first test case in the original comment is also editable in Chrome without any issue.

@btbonval
Copy link
Member Author

Maybe the database will reveal a difference?

No spans. break this stuff<br>...

FireFox added span. &lt;span&gt;break this stuff&lt;/span&gt;<br>...

Chrome added span. &lt;span&gt;break this stuff&lt;/span&gt;<br>...

So why is FireFox okay if Chrome adds the span but not FireFox?

@btbonval
Copy link
Member Author

Maybe there's something being cached somehow in FireFox? If Chrome adds the span, FireFox would not have the cache. If FireFox adds it, it would have the cache.

Close all private windows. Start a new private window with a fresh cache. Login. Edit the note to add spans, save.

Close all private windows. Start a new private window with a fresh cache. Login. Edit the note. Broken WYSI, flat HTML showing.

Alright so it isn't caching.

@btbonval
Copy link
Member Author

I am running Chrome on a separate computer, using a 10.x.x.x network address to access the site. Meanwhile I'm running FireFox on the same computer, using a 127.0.0.1 network address to access the site.

Let's try FireFox on my separate computer using 10.x.x.x. Separate computer is also running FF 35.0.1 on Linux Mint, so that is consistent. New private window. Login to site running at 10.x.x.x. Visiting page with <span> in place. Edit this note. Broken WYSI with flat HTML.

Not related to networking.

@btbonval
Copy link
Member Author

FF 35.0.1 on Windows 8. WYSI is broken due to SPAN. Not limited to Linux FF 35.0.1.

Chrome 40.0.2214.94 on Windows 8. WYSI works as expected even with SPAN.

IE 11.0.9600.17498 on Windows 8. WYSI works as expected even with SPAN.

FireFox just updated to 36.0, so let's give that a go.

FF 36.0 on Windows 8. WYSI is broken due to SPAN.

Alright, so FF, either version 35.0.1 or 36.0 on either Windows or Linux, does not use the WYSIHTML properly if there is a span in the text on our site. However, the wysihtml.com example works just fine with divs and spans in FF.

This is going to be a very specific bug to try tracking down at this point. Need to check in with @AndrewMagliozzi to see if this worth spending any more time on.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant