Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve buffer abstraction's encoding handling in JRuby dumper #760

Open
3 tasks
headius opened this issue Mar 10, 2025 · 0 comments
Open
3 tasks

Improve buffer abstraction's encoding handling in JRuby dumper #760

headius opened this issue Mar 10, 2025 · 0 comments
Assignees
Labels

Comments

@headius
Copy link
Contributor

headius commented Mar 10, 2025

In jruby/jruby#8682 we discovered that the use of IOOutputStream in GeneratorState.generate (for wrapping an IO-like object) is impacted by jruby/jruby#6588, poor handling of encodings in the implementation of byte[]-only OutputStream methods.

Specifying no encoding for IOOutputStream defaults to ASCII-8BIT, which breaks if the target IO has a MBC external encoding and any characters are in the high ASCII range.

Specifying UTF-8 as the encoding should work, but is impacted by jruby/jruby#8686, which fails to no-op when the provided encoding and the target IO's external encoding and subsequently errors in the character-transcoding subsystem.

In order to work around these issues, I have pushed #759 to force slow-path logic in IOOutputStream (dynamic "write" calls with String objects) whenever the target object is an IO with an external encoding. However we should restore the fast write logic by doing the following:

  • Fix the fast write logic downstream from IOOutputStream.write, so it accurately handles all incoming encodings (Handle encoding checks as in strTranscode jruby/jruby#8687).
  • Detect fixed versions of JRuby and switch to fast-write logic.
  • Implement a more robust IO-like wrapper that can handle mixed-encoding input, either in JRuby or in json.
@headius headius changed the title Switch to fully encoding-aware IO/buffer abstraction for dump Improve buffer abstraction's encoding handling in JRuby dumper Mar 10, 2025
headius added a commit to headius/jruby that referenced this issue Mar 10, 2025
Logic in strTranscode evolved over the years to allow same-encoding
requests to be no-ops. Those changes were never applied to
rbByteEncode, resulting in same-encoding requests triggering
errors when the transcoding subsystem saw nothing would be done.
This complicated efforts to solve jruby#8682 by passing an
encoding to the IOOutputStream constructor (ruby/json#759 and
ruby/json#760).

This patch allows using IOOutputStream and the byte[] IO API it
calls with an externally-encoded IO by passing in an expected
encoding for incoming bytes. All bytes will be treated as being
encoded properly, and if the source and destination encoding is the
same, rbByteEncode will return null to indicate no-op.

Note that this misses some functionality of strTranscode in that it
does not scrub the string for same-encoding requests.

Partially addresses ruby/json#760.

Fixes jruby#8686.
@byroot byroot added the jruby label Mar 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants