Encoding issue in request.py #464

rjstanford · 2024-04-04T13:33:27Z

I'm not entirely sure what the intent is here so hesitate to file a PR. We saw some errors thrown by our webapp (using gunicorn) and traced it to request.encget():

  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 495, in url
    url = self.path_url
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 467, in path_url
    bpath_info = bytes_(self.path_info, self.url_encoding)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/descriptors.py", line 70, in fget
    return req.encget(key, encattr=encattr)
  File "/layers/google.python.pip/pip/lib/python3.9/site-packages/webob/request.py", line 165, in encget
    return bytes_(val, 'latin-1').decode(encoding)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 66: invalid start byte"

My read of util.byte_ is that, when passed a string, it performs val.encode() on it. So the following code in encget():

return bytes_(val, "latin-1").decode(encoding)

is the same as doing:

return val.encode("latin-1", "strict").decode(encoding)

Based on our exception we can see that the value of encoding is "utf-8", which gives us:

return val.encode("latin-1", "strict").decode("utf-8")

or with a specific example that will fail:

x = "À".encode('latin-1').decode('utf-8')

I'm not sure why we'd ever be explicitly encoding a string as latin-1 and then decoding it as UTF-8 in the first place -- a simpler return val.encode(encoding) would seem more appropriate here -- but again, there's probably nuance that I'm not understanding, hence the issue report.

The text was updated successfully, but these errors were encountered:

rjstanford · 2024-04-04T13:46:03Z

This is on released version 1.8.7 btw, I see that there's been some unreleased development since then.

digitalresistor · 2024-08-14T05:04:52Z

This is due to the fact that HTTP doesn't officially support unicode in HTTP requests/paths and as explained in https://peps.python.org/pep-3333/#unicode-issues all of the HTTP path/URI's should be treated as latin-1.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encoding issue in request.py #464

Encoding issue in request.py #464

rjstanford commented Apr 4, 2024

rjstanford commented Apr 4, 2024

digitalresistor commented Aug 14, 2024

Encoding issue in request.py #464

Encoding issue in request.py #464

Comments

rjstanford commented Apr 4, 2024

rjstanford commented Apr 4, 2024

digitalresistor commented Aug 14, 2024