Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved parser performance #372

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open

Improved parser performance #372

wants to merge 10 commits into from

Conversation

jkronegg
Copy link
Contributor

🤔 What's changed?

It started as a temptative to improve StringUtils (#361), but ended up with many other small improvements, all based on JMH micro-benchmark and IntelliJ profiler.

The following improvements have been done (no public API modified):

  • GherkinLine:
    • constructor/getTableCells(): rewrote trim+symbolCount logic to an integrated operation which trim and intent at the same time
    • getTags(): replaced split by String traversal and using compiled Regexp
  • GherkinDocumentBuilder: compiled regexps and simplified mapping between TokenType and RuleType
  • GherkinDialect: precomputing list size and removed duplicates
  • EncodingParser: avoid split on the while file to split only the first lines

The parser is now 1.7x faster (=40%) on the very_long.feature test file, as reflected by JMH micro-benchmark:

 MyClassBenchmark.original  avgt   25  5514.929 ± 943.017  us/op
 MyClassBenchmark.modified  avgt   25  3265.629 ± 287.454  us/op

There is the JMH code to reproduce the results:

package io.cucumber.gherkin;

import io.cucumber.messages.types.Envelope;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;

import java.io.IOException;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;

public class MyClassBenchmark {
    @Benchmark
    @BenchmarkMode(Mode.AverageTime)
    @OutputTimeUnit(TimeUnit.MICROSECONDS)
    public Stream<Envelope> original() throws IOException {
        return GherkinParser.builder().build().parse(Paths.get("../testdata/good/very_long.feature"));
    }
}

On a real project with 1000 scenarios, 50 parameterTypes and 250 step definitions, the IntelliJ profiler gives for GherkinMessagesFeatureParser.parse:

  • original version (gherkin 31.0.0): 434 ms
  • modified version (this PR): 209 ms

That's 2.1x faster... 😁

⚡️ What's your motivation?

Fixes #361

🏷️ What kind of change is this?

  • 🏦 Refactoring/debt/DX (improvement to code design, tooling, etc. without changing behaviour)

♻️ Anything particular you want feedback on?

On this PR, we can run the following test:

@Test
void test_for_profiler_parser() throws IOException {
    for (int i=0; i<1000; i++) GherkinParser.builder().build().parse( Paths.get("../testdata/good/very_long.feature"));
}

Below is the Intellij profile flame graph for this test:
image

The is still some little room for improvement in:

I'm not counting the UUID.randomUUID() because it can be easily solved by selecting a faster UUID generator (e.g. IncrementingUuidGenerator) by configuring Cucumber properly.

📋 Checklist:

  • I agree to respect and uphold the Cucumber Community Code of Conduct
  • I've changed the behaviour of the code
    • I have added/updated tests to cover my changes.
  • My change requires a change to the documentation.
    • I have updated the documentation accordingly.
  • Users should know about my change
    • I have added an entry to the "Unreleased" section of the CHANGELOG, linking to this pull request.

@mpkorstanje mpkorstanje self-requested a review February 11, 2025 16:21
@luke-hill
Copy link
Contributor

luke-hill commented Feb 12, 2025

Given this is close to completion @jkronegg i'll wait til you fix up the last bits of the codegeneration task, before cutting and releasing v32

Quick question, is this really a bug fix or more of a generic change? Just noticed you popped a changelog in fixed?

@jkronegg
Copy link
Contributor Author

From my point of view, performance issues are bugs (given the non-functional requirement "the application must go as fast as possible"😁). But someone else could consider this PR as an improvement given no one asked for that NFR😅.
@luke-hill as you prefer.

@luke-hill
Copy link
Contributor

As/when it's reviewed by Rien he can best advise as this is a big Java improvement. I'm in no way able to advise on Java stuff

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Performance issue in StringUtils
2 participants