Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError when filename includes non ASCII characters #287

Open
davuses opened this issue Mar 31, 2023 · 1 comment
Open

UnicodeDecodeError when filename includes non ASCII characters #287

davuses opened this issue Mar 31, 2023 · 1 comment

Comments

@davuses
Copy link

davuses commented Mar 31, 2023

trying to read from a file whose filename is not ascii characters:

magic.from_file("説明.txt")

And this gives me error:

Traceback (most recent call last):
  File "G:\BaiduNet\unarchive.py", line 64, in <module>
    magic.from_file("説明.txt")
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 135, in from_file
    return m.from_file(filename)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 89, in from_file
    return maybe_decode(magic_file(self.cookie, filename))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 214, in maybe_decode
    return s.decode('utf-8')
           ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 16: invalid continuation byte

If I rename the file to ASCII name, say file.txt, the problem disappears.

Also, if I use .from_buffer(), there's no issue:

magic.from_buffer(open("説明.txt", "rb").read(2048), mime=True)

weird, not sure if this is related to this issue #205

The package is installed with pip install python-magic-bin on WIndows 11, Python3.11

@silente
Copy link

silente commented Apr 11, 2023

Hi, I have the same problem.

My code is:

magic.from_file(file_path, mime=True)

My error is:

  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 135, in from_file
    return m.from_file(filename)
  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 89, in from_file
    return maybe_decode(magic_file(self.cookie, filename))
  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214, in maybe_decode
    return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 57: invalid continuation byte

I tried to edit "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214 from return s.decode('utf-8') to return s.decode('utf-8', errors='ignore') or return s.decode('utf-8', errors='replace') but I still encounter the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants