-
Notifications
You must be signed in to change notification settings - Fork 2
Add native support for concatenated ClassAds in grammar #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add native support for concatenated ClassAds in grammar #5
Conversation
Previously, concatenated ClassAds (e.g., "][" without whitespace) were handled by a wrapper reader that inserted newlines. This change modifies the grammar to natively support parsing multiple concatenated ClassAds. Changes: - Modified grammar (classad.y) to accept classad_list with multiple ClassAds - Added ParseMultipleClassAds() function to parser package - Added ParseMultiple() wrapper in classad package - Refactored Reader to use grammar-based parsing instead of wrapper reader - Removed classAdSeparatorReader wrapper implementation - Added comprehensive tests for concatenated ClassAd parsing This provides cleaner architecture, better performance, and native grammar support for the HTCondor format where ClassAds may be concatenated without whitespace between them.
Welcome to Codecov 🎉Once you merge this PR into your default branch, you're all set! Codecov will compare coverage reports and display results in all future pull requests. ℹ️ You can also turn on project coverage checks and project coverage reporting on Pull Request comment Thanks for integrating Codecov - We've got you covered ☂️ |
bbockelm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This buffers everything into memory -- I think we need to keep things more stream-oriented, processing ads one-by-one. Reasonable first attempt, just needs some tuning.
…fixes - Add buffer size limit (10MB) to prevent unbounded memory growth - Fix UTF-8 handling in findCompleteClassAd for proper multi-byte character support - Refactor duplicate EOF handling logic into handleEOF() method - Fix variable shadowing in findCompleteClassAd return values - Add comprehensive edge case tests: * UTF-8 characters in attribute values and strings * Empty brackets handling * Buffer size limit enforcement * Unclosed brackets (malformed input) * Multiple ClassAds with UTF-8 content - Optimize buffer size checks to occur before expensive scans - Add constants for maxBufferSize and readChunkSize for maintainability All existing tests pass, and new edge case tests verify robustness.
bbockelm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're hitting a basic design issue in the current lexer - it wants to be fed a full ClassAd as a string.
That's a dumb design causing you to generate basically a mini-parser in the reader to buffer out a full ad.
I'll try a more fundamental change, allowing the lexer to work on streams.
|
Fixed in #6 |
Previously, concatenated ClassAds (e.g., "][" without whitespace) were handled by a wrapper reader that inserted newlines. This change modifies the grammar to natively support parsing multiple concatenated ClassAds.
Changes:
This provides cleaner architecture, better performance, and native grammar support for the HTCondor format where ClassAds may be concatenated without whitespace between them.