-
-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved parser performance #372
base: main
Are you sure you want to change the base?
Conversation
Given this is close to completion @jkronegg i'll wait til you fix up the last bits of the codegeneration task, before cutting and releasing v32 Quick question, is this really a bug fix or more of a generic change? Just noticed you popped a changelog in fixed? |
From my point of view, performance issues are bugs (given the non-functional requirement "the application must go as fast as possible"😁). But someone else could consider this PR as an improvement given no one asked for that NFR😅. |
# Conflicts: # CHANGELOG.md
As/when it's reviewed by Rien he can best advise as this is a big Java improvement. I'm in no way able to advise on Java stuff |
🤔 What's changed?
It started as a temptative to improve
StringUtils
(#361), but ended up with many other small improvements, all based on JMH micro-benchmark and IntelliJ profiler.The following improvements have been done (no public API modified):
GherkinLine
:getTableCells()
: rewrote trim+symbolCount logic to an integrated operation which trim and intent at the same timegetTags()
: replaced split by String traversal and using compiled RegexpGherkinDocumentBuilder
: compiled regexps and simplified mapping betweenTokenType
andRuleType
GherkinDialect
: precomputing list size and removed duplicatesEncodingParser
: avoid split on the while file to split only the first linesThe parser is now 1.7x faster (=40%) on the
very_long.feature
test file, as reflected by JMH micro-benchmark:There is the JMH code to reproduce the results:
On a real project with 1000 scenarios, 50 parameterTypes and 250 step definitions, the IntelliJ profiler gives for
GherkinMessagesFeatureParser.parse
:That's 2.1x faster... 😁
⚡️ What's your motivation?
Fixes #361
🏷️ What kind of change is this?
♻️ Anything particular you want feedback on?
On this PR, we can run the following test:
Below is the Intellij profile flame graph for this test:
data:image/s3,"s3://crabby-images/c095a/c095a25fa643a70fac4f94caa1a67848c9b97cd2" alt="image"
The is still some little room for improvement in:
getLocation
(8%): avoid usingLong
values incucumber-messages
when primitive types can be used, see Codegen generates inefficient Java code forLong
andBoolean
mandatory parameters messages#283cucumber-messages
(1%): use Java 10+ to avoid recreating immutable lists, see Codegen generates inefficient Java code forList
parameters messages#282I'm not counting the
UUID.randomUUID()
because it can be easily solved by selecting a faster UUID generator (e.g.IncrementingUuidGenerator
) by configuring Cucumber properly.📋 Checklist: