-
Notifications
You must be signed in to change notification settings - Fork 169
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
For regular expression /#{ }\xc2\xa1/e
, which comes from test_m17n.rb
in CRuby tests, the flags are respectively forced_utf8_encoding
and forced_binary_encoding
for source encoding UTF-8 and US-ASCII.
I am not sure if this is correct or not.
We are looking at this in TruffleRuby and honoring those flags is causing to compute the wrong Regexp encoding.
I guess ideally because of the /e
the parts would be "force_eucjp_encoding".
That seems the best to avoid mistakes in consumers.
Or maybe no flags, and let the consumer attach the encoding correctly, though more error-prone it might be better than (arguably) the "wrong" encoding flag.
@kddnewton WDYT?
$ bin/parse -e '/#{ }\xc2\xa1/e'
@ ProgramNode (location: (1,0)-(1,15))
├── locals: []
└── statements:
@ StatementsNode (location: (1,0)-(1,15))
└── body: (length: 1)
└── @ InterpolatedRegularExpressionNode (location: (1,0)-(1,15))
├── flags: euc_jp
├── opening_loc: (1,0)-(1,1) = "/"
├── parts: (length: 2)
│ ├── @ EmbeddedStatementsNode (location: (1,1)-(1,5))
│ │ ├── opening_loc: (1,1)-(1,3) = "\#{"
│ │ ├── statements: ∅
│ │ └── closing_loc: (1,4)-(1,5) = "}"
│ └── @ StringNode (location: (1,5)-(1,13))
│ ├── flags: forced_utf8_encoding
│ ├── opening_loc: ∅
│ ├── content_loc: (1,5)-(1,13) = "\\xc2\\xa1"
│ ├── closing_loc: ∅
│ └── unescaped: "\\xc2\\xa1"
└── closing_loc: (1,13)-(1,15) = "/e"
$ bin/parse -e '# encoding: US-ASCII
/#{ }\xc2\xa1/e'
...
AST:
@ ProgramNode (location: (2,0)-(2,15))
├── locals: []
└── statements:
@ StatementsNode (location: (2,0)-(2,15))
└── body: (length: 1)
└── @ InterpolatedRegularExpressionNode (location: (2,0)-(2,15))
├── flags: euc_jp
├── opening_loc: (2,0)-(2,1) = "/"
├── parts: (length: 2)
│ ├── @ EmbeddedStatementsNode (location: (2,1)-(2,5))
│ │ ├── opening_loc: (2,1)-(2,3) = "\#{"
│ │ ├── statements: ∅
│ │ └── closing_loc: (2,4)-(2,5) = "}"
│ └── @ StringNode (location: (2,5)-(2,13))
│ ├── flags: forced_binary_encoding
│ ├── opening_loc: ∅
│ ├── content_loc: (2,5)-(2,13) = "\\xc2\\xa1"
│ ├── closing_loc: ∅
│ └── unescaped: "\\xc2\\xa1"
└── closing_loc: (2,13)-(2,15) = "/e"
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request