Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ArgumentNullException in C# tokenizer at EOF when using .NET Regex extensions #9

Open
Stevie-O opened this issue Jul 20, 2019 · 3 comments
Labels

Comments

@Stevie-O
Copy link

I tried to to create a parser and test it (which was a challenge, because I can find no examples documenting how to properly use the generated code), and I get an ArgumentNullException whenever I call Parse with my parser.

Further research indicates that it's because I'm using regex syntax that the Grammatica regex engine doesn't support, so it's automatically switching to using .NET's System.Text.RegularExpressions.RegEx class.

Tracing through a parse of a file consisting of a single blank line, here's what I see:

  • ReaderBuffer.Peek(offset: 0) is called
  • ReadBuffer.Peek calls EnsureBuffered(offset: 1)
  • EnsureBuffered tries to read BLOCK_SIZE characters.
    • It calls input.Read which reads 2 characters (CR and LF)
    • It calls input.Read again which, because we're at EOF, returns 0
    • It then calls input.Close and then sets input to null.
  • Eventually, control returns to Tokenizer.NextToken
  • Tokenizer.NextToken soon calls ReadBuffer.Read(offset: 2) -- which, despite the comments, seems to actually mean "Read from the current stream position until offset 1"
  • ReadBuffer.Read sees that input is null and calls Dispose() which sets buffer to null
  • Tokenizer.NextToken is called again to return the next token
  • NextToken calls regExpMatcher.Match
  • RegExpMatcher.Match calls (in a loop) REHandler.Match (an abstract method)
  • When it hits on the .NET-native RegEx, that method call resolves to SystemRE.Match
  • SystemRE.Match calls buffer.ToString()
  • buffer.ToString() is return new string(buffer, 0, length) (a different buffer)
  • Since that buffer is null, the string constructor throws an ArgumentNullException.

I see several problems here:

  • ReadBuffer.Read calls Dispose on itself. Dispose is supposed to mean "I am done using this object"; since ReadBuffer doesn't own itself, it's disposing an object owned by someone else (the Tokenizer) while the Tokenizer is still using it
  • Tokenizer.NextToken tries to parse at EOF; it should probably return null immediately at EOF without trying to parse an empty buffer
  • ReadBuffer.ToString can throw an exception (never a good thing)
@cederberg
Copy link
Owner

cederberg commented Aug 17, 2019

We are talking about ReaderBuffer.cs here, right?

Think the bug is in ToString(), which should return an empty string if the buffer is null. The Dispose() method is supposed to be safe to call multiple times, in order to ensure that resources are released (eagerly) for each and every corner case.

Regarding Read(), it only guarantees to read at least the number of characters requested. In practice it may read more as it attempts to read at least 1k blocks in advance.

You're probably right that Tokenizer.NextToken (line 350) should be moved up, so that buffer.Peek(0) < 0 is the first thing. Can't say I understand why it doesn't do that already to be honest. Looks like a bug as well.

@cederberg cederberg added the bug label Aug 17, 2019
@slluis
Copy link

slluis commented Oct 25, 2020

I also hit this, is there any workaround for this bug?

@SlobodanVucetic
Copy link

Was there any progress on this issue? Is there any proposed workaround?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants