-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
String arrives with different characters in Firefox #4
Comments
You may need to add |
Good idea! I added console.log(document.characterSet); to verify whether Firefox respects the BOM at start of my HTML file, and it seems it did:
Still, I added to the HTML: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> Same output in Firefox's console.
As you see at the beginning between the "5c 22" ( PS: For completeness, I'd normally have tested your original suggestion |
I boiled down the hex test case for test without Firefox. New core: var dog = '\uD83D\uDC15', cow = '\uD83D\uDC04', halves = dog[1] + cow[0],
jsStringify = require('js-stringify');
console.log('orig: <' + halves + '>');
console.log('JSON: ' + JSON.stringify(halves));
console.log('js-str:' + jsStringify(halves)); The hex dump shows that they only differ in one byte, (Edit: yeah, the angle bracket.) 2nd column from the right, and that
Edit: False positive angle bracket. Gonna investigate more. |
Looks like Node.js's stdout cannot (edit: with default config) write those UCS-2 characters:
… and I assume it's not a node bug but the UCS-2 chars just cannot be represented in stdout's default encoding, UTF-8. So what if I change the encoding:
That works! So I change it in the HTML generator: html = ['\uFEFF<!DOCTYPE html><html><head>',
// […]
'</script></body></html>'].join('\n');
process.stdout.write(html, 'UCS-2'); Thanks to the BOM at start of HTML, Firefox now correctly detects There has to be an easier way. How about I have js-stringify escape most data and then escape the non-UTF-8 chars myself? console.log(['\uFEFF<!DOCTYPE html><html><head>',
'<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">',
'</head><body><p></p><script>',
String(charCodes),
'console.log(document.characterSet);',
'console.log(charCodes(' +
require('surrog8').uHHHH(jsStringify(animals)) +
'));',
'console.log(' + charCodes(jsStringify(animals)) + ');',
'</script></body></html>'].join('\n')); Yeah! That one works! Firefox detects UTF-8 and both objects arrive correctly! … because the HTML says: console.log(charCodes("{ \"dog\": \"\uD83D\uDC15\", \"cow\": \"\uD83D\uDC04\", \"halfbreed\": \"\uDC15\uD83D\" }")); So now we know how to fix the encoding, and we just have to clarify whether data integrity is part of js-stringify's "safely" claim:
… or that's meant more as "secure" (defend against injection and similar), and I shall make a new module to combine that with my need for "verbatim". |
For the transition period, I made utf8safe-js-stringify. Still hope this will become the default and soon |
I wrote a node.js program to generate HTML code, the core:
I gisted the full source, also the generated HTML and what Firefox prints in its console when loading the HTML:
As you can see, in the first expression, the halfbreed characters both became U+FFFD. Probably some Unicode interpolation is messing with JavaScript's UCS-2 characters. Is this a bug? If a feature instead, it should be documented more prominently.
The text was updated successfully, but these errors were encountered: