You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In jruby/jruby#8682 we discovered that the use of IOOutputStream in GeneratorState.generate (for wrapping an IO-like object) is impacted by jruby/jruby#6588, poor handling of encodings in the implementation of byte[]-only OutputStream methods.
Specifying no encoding for IOOutputStream defaults to ASCII-8BIT, which breaks if the target IO has a MBC external encoding and any characters are in the high ASCII range.
Specifying UTF-8 as the encoding should work, but is impacted by jruby/jruby#8686, which fails to no-op when the provided encoding and the target IO's external encoding and subsequently errors in the character-transcoding subsystem.
In order to work around these issues, I have pushed #759 to force slow-path logic in IOOutputStream (dynamic "write" calls with String objects) whenever the target object is an IO with an external encoding. However we should restore the fast write logic by doing the following:
Detect fixed versions of JRuby and switch to fast-write logic.
Implement a more robust IO-like wrapper that can handle mixed-encoding input, either in JRuby or in json.
The text was updated successfully, but these errors were encountered:
headius
changed the title
Switch to fully encoding-aware IO/buffer abstraction for dump
Improve buffer abstraction's encoding handling in JRuby dumper
Mar 10, 2025
Logic in strTranscode evolved over the years to allow same-encoding
requests to be no-ops. Those changes were never applied to
rbByteEncode, resulting in same-encoding requests triggering
errors when the transcoding subsystem saw nothing would be done.
This complicated efforts to solve jruby#8682 by passing an
encoding to the IOOutputStream constructor (ruby/json#759 and
ruby/json#760).
This patch allows using IOOutputStream and the byte[] IO API it
calls with an externally-encoded IO by passing in an expected
encoding for incoming bytes. All bytes will be treated as being
encoded properly, and if the source and destination encoding is the
same, rbByteEncode will return null to indicate no-op.
Note that this misses some functionality of strTranscode in that it
does not scrub the string for same-encoding requests.
Partially addresses ruby/json#760.
Fixesjruby#8686.
In jruby/jruby#8682 we discovered that the use of IOOutputStream in GeneratorState.generate (for wrapping an IO-like object) is impacted by jruby/jruby#6588, poor handling of encodings in the implementation of byte[]-only OutputStream methods.
Specifying no encoding for IOOutputStream defaults to ASCII-8BIT, which breaks if the target IO has a MBC external encoding and any characters are in the high ASCII range.
Specifying UTF-8 as the encoding should work, but is impacted by jruby/jruby#8686, which fails to no-op when the provided encoding and the target IO's external encoding and subsequently errors in the character-transcoding subsystem.
In order to work around these issues, I have pushed #759 to force slow-path logic in IOOutputStream (dynamic "write" calls with String objects) whenever the target object is an IO with an external encoding. However we should restore the fast write logic by doing the following:
The text was updated successfully, but these errors were encountered: