Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding feature to allow for skipping RS control char #633 #1155

Open
wants to merge 6 commits into
base: 2.18
Choose a base branch
from

Conversation

Creaturism
Copy link

Attempt at fixing #633

@Creaturism
Copy link
Author

Looking for guidance on how the pattern for checking for the feature should be implemented in this context

@@ -2538,8 +2540,8 @@ private final int _skipWSOrEnd() throws IOException
_currInputRowStart = _inputPtr;
} else if (i == INT_CR) {
_skipCR();
} else if (i != INT_TAB) {
_throwInvalidSpace(i);
} else if (i != INT_TAB && ((_features & FEAT_MASK_ALLOW_CONTROL_CHAR) != 0 && i != INT_RS)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think latter && needs to be || (so failure if Not Tab and either Don't Allow RS OR not RS).
Otherwise fails some unit tests on CI (see failures)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@cowtowncoder
Copy link
Member

Aside from the one logic test I pointed out, yes, I think this looks like the way to go.
So:

  1. There are 4 parser backends (Byte-based, Reader-based, DataInput-backed, Async/Non-blocking
  2. Need unit tests for all 4 backends, but can be simple.

One big question is this: should RS be allowed only as white-space, or also in Content?
I think former would be easier, latter little bit more work. But I guess latter is not all that much work.
But maybe start with former, and make description say that it is allowed for ignorable whitespace between tokens. If support for in-content values (as escaped) is needed, can add separate feature.

On converting to TAB -- if only used for ignorable whitespace, no need to do anything: conceptually can be thought of working like TAB of course, but ignorable whitespace is not reported in any way.
But even in "allow in content" case, I'd vote for keeping it as-is with no conversion.

@cowtowncoder
Copy link
Member

I pushed some minor wording changes, addition of @since annotations in Javadocs.

@cowtowncoder
Copy link
Member

Quick note: since 2.17.0 was released, can no longer be merged into 2.17 branch: API changes need to go in new minor releases. So needs rebasing to 2.18.

@Creaturism Creaturism changed the base branch from 2.17 to 2.18 May 6, 2024 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants