Added UTF-8 encoding when opening files. #10

rodjuncode · 2020-08-04T18:55:31Z

Hello! I was trying to train my own vector model, but got decoding errors during the linebreaks removal. I figured out that forcing the encoding (encoding="utf-8") during the file opening would fix it. Added to both input and output files. Also, added "ensure_ascii=False" to the json dump, so the "\u" characters are dumped as human readable chars.

Hope this helps, and thanks for the great work!

Forcing utf 8 guarantees compatibility with non-english languages.

Added UTF-8 encoding when opening files.

495336e

Forcing utf 8 guarantees compatibility with non-english languages.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added UTF-8 encoding when opening files. #10

Added UTF-8 encoding when opening files. #10

rodjuncode commented Aug 4, 2020

Added UTF-8 encoding when opening files. #10

Are you sure you want to change the base?

Added UTF-8 encoding when opening files. #10

Conversation

rodjuncode commented Aug 4, 2020