-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JSON Unmarshal of some Unicode escape sequence results in error #1576
Comments
U+DCA0 is a Unicode surrogate, and should not appear in valid UTF-8 text. A conformant UTF-8 decoder is required to reject input containing an unpaired surrogate; see "How do I convert an unpaired UTF-16 surrogate to UTF-8?" here: https://unicode.org/faq/utf_bom.html I'm not sure about the history that led to |
I think this is still generating an improper error message. It seems to be complaining that the invalid escape code is |
Yes, you're right; I wasn't looking at the error message. The error is definitely wrong, or at least confusing. Also, the protojson parser expects a surrogate to be followed by |
Yeah… not sure what the proper behavior should be in error conditions. But also, do we check at all that a Hi Surrogate is followed by a Lo Surrogate, and vice versa, or do we just check that surrogates occur in pairs? I smell the familiar scent of a rabbit role into a pedantic reading of standards… 😩 At least the minimal solution here seems to be that if a surrogate isn’t followed by a surrogate, we report the escape itself as invalid? Or some sort of message that surrogates must be paired? |
The JSON mapping for protobuf follows RFC 7493, which requires strict checking of surrogate halves (see section 2.1 of RFC 7493). The "encoding/json" package only adheres to RFC 8259, which leaves split surrogate halves as undefined behavior (see section 8.2 of RFC 8259). As a side note, the "encoding/json/v2" draft proposal will target compliance with RFC 7293 by default.
A standalone surrogate half is invalid and should be rejected. The validity of a surrogate pair is determine using I don't see anything wrong with what
Strictly speaking, this error message is correct as |
The proposed future "encoding/json/jsontext" package produces a better error message:
as it preserves the context that this is occurring within a surrogate pair. |
Interesting, I see what you mean with the “ I do like the error message from |
Improving the error message seems reasonasble to use and we are happy to accept a contribution. |
What version of protobuf and what language are you using?
Golang lib:
What did you do?
Steps to reproduce the behavior:
\udca0
, e.g.{"name":"'unsafe-eval'; object?src\udca0'none'"}
.google.golang.org/protobuf/encoding/protojson
package.proto: syntax error (line 1:9): invalid escape code "'none'" in string
.Code snippet for reproducing issue:
The example code snippet above produces the following output:
What did you expect to see?
Unmarshalled value
name:"'unsafe-eval'; object?src�'none'"
. I.e. same unmarshalled string value as in Golang core JSON lib.What did you see instead?
Unpopulated message and an error
proto: syntax error (line 1:9): invalid escape code "'none'" in string
.Anything else we should know about your project / environment?
Reproduced & confirmed using the followed golang versions and environments:
Linux:
go version go1.20.11 linux/amd64
macOS:
go version go1.21.4 darwin/arm64
The text was updated successfully, but these errors were encountered: