-
Notifications
You must be signed in to change notification settings - Fork 3.8k
Parallel cucumbers #7309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Parallel cucumbers #7309
Conversation
|
Thanks for working on speeding up the CI! The parallel cucumber approach looks promising. One concern: could we keep the CI matrix changes out of this PR? Switching from macos-13 to macos-15-intel drops test coverage for users on Ventura/Sonoma, and adding clang-20 (which isn't released yet) seems unrelated to the parallelization work. These changes could be discussed separately if there's a reason to drop older platform support. Also spotted a few bugs in osrm_loader.js - there are references to an undefined err variable in the error handlers (lines 45, 118, 132) that would crash if osrm-routed fails. |
|
The PR is still WIP. I will cleanup before releasing it. BTW clang-21 is current. The reason to drop macos-13 is that github does not support it anymore. |
features/lib/osrm_loader.js
Outdated
| ); | ||
| callback(); | ||
| }); | ||
| // process.on('SIGIO', this.resolve); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left-over?
|
Do you think the non-cucumber changes (datastore, routed, etc) can be broken out independently without messing with the tests? It is always quite the challenge to review large changes like this so I would appreciate the extra effort if possible. 🙏 |
|
The shared memory changes could easily go elsewhere, as they are not the primary reason for sync loss between the test suite and routed. But the change in routed is critical (and trivial too). The changes to the github workflow should go elsewhere. I'm just playing around to get windows to run the test suite too. See if any surprises come up. |
e09c32e to
6b55220
Compare
|
Ready for review. I have removed everything that is not strictly necessary. The shared memory fixes for macOS and Windows will go into another PR. Until that macOS tests with datastore will be flaky. |
db91c60 to
0f33530
Compare
| import { setDefaultTimeout } from '@cucumber/cucumber'; | ||
|
|
||
| // Set global timeout for all steps and hooks | ||
| setDefaultTimeout( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have removed the setDefaultTimeout() method and that is why the CUCUMBER_TIMEOUT: 60000 from the conan-macos-arm64-release-node section is not passed anywhere and default timeout of 5000 is used causing the macOS arm64 job to fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The macOS builds fail randomly because of a race condition: see #7320 #7321. The build fails if osrm-datastore exits too early, not too late. Thus a longer timeout would not solve anything here.
And if osrm-routed dies for whatever reason, then a longer timeout would just slow down things because each scenario would take exactly timeout ms to fail.
OTOH the timeout was the reason the builds kept failing for such a long time: each time a build failed somebody just raised the timeout. Then, if the next CI run did by pure chance succeed, the problem was declared solved. Didn't you ever wonder how a totally absurd timeout of 60 seconds was reached?
I'm thinking of removing the timeout setting altogether. It induces people to fiddle with the parameters instead of finding the underlying cause. The default 5 sec is plenty.
|
|
||
| // ensure there is a make a cache directory | ||
| fs.mkdirSync(this.featureCacheDirectory, { recursive: true }); | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semicolon is not needed
|
|
||
| before(scenario) { | ||
| this.current_scenario = scenario; | ||
| this.semaphore = new Promise((resolve) => this.resolve = resolve); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding a timeout fallback:
before(scenario) {
this.current_scenario = scenario;
this.semaphore = new Promise((resolve, reject) => {
this.resolve = resolve;
setTimeout(() => reject(new Error('Timed out waiting for facade update')), env.wp.timeout);
});
// ...
}Or use Promise.race() with the existing waitForConnection() timeout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case Cucumber will timeout. No need to complicate the code.
Issue
Fixes #7307 - CI runs up from 0:45h to 4:30h
Improvements
Parallel execution of cucumber tests tremendously improves running times.
Note: when using
--parallel Nmake sure there areNcontiguous free ports at theconfigured port number (eg. at ports 5000--5000+N).
A lot more can be configured with profiles (in
cucumber.mjs). Usage examples:Github step summaries display the test results:
On github CI full logs, also containing the outputs from all binaries, are published to
https://reports.cucumber.io/.Code refactored and simplified. ~150 LoC less.
Other Fixes
During development other problems were found and fixed:
The datastore load method now works as expected: one and the same
osrm-routedwillbe running in the background for the whole duration of the test. Previous behaviour: for
each scenario an
osrm-routedwould be spawned, loaded with datastore, and then killedagain.
The data watchdog in
osrm-routednow will emit the message: 'updated facade' rightafter the update (previous behaviour: right before the update), so that it can be used
for process synchronization.
Breaking changes
If you run the tests using the
npx cucumber-js ...calling method you must add either-p homeor-p githubas the first profile, eg.:instead of:
Old cached files are not cleaned automatically anymore because that interfered with
concurrency. There is no hook in cucumber that runs before the workers start.
We don't support changing the load method in mid-run anymore. (eg. "Given data is loaded
with datastore" no longer works). Reason: With the datastore load method an
osrm-routedwill be constantly running in the background. No secondosrm-routedcanopen the same port again.
Retired Environment Variables
OSRM_CONNECTION_RETRIESOSRM_CONNECTION_EXP_BACKOFF_COEFOther changes
Removed one cache level. CH and MLD are processed and cached separately.
The outputs of osrm binaries are not written into lots of small files anymore. Instead
use the
--format html:test/logs/testlog.htmlcommandline argument to get one big HTMLfile with all outputs embedded.
New configuration variables
New environment variables and/or cucumber
worldParametersentries were introduced. Thewhole configuration is done in
cucumber.mjs.CUCUMBER_TIMEOUTtimeoutCUCUMBER_HTTP_TIMEOUThttpTimeoutCUCUMBER_TEST_PATHtestPathtestCUCUMBER_PROFILES_PATHprofilesPathprofilesCUCUMBER_LOGS_PATHlogsPathtest/logsCUCUMBER_CACHE_PATHcachePathtest/cacheOSRM_BUILD_DIRbuildPathbuildOSRM_LOAD_METHODloadMethodOSRM_ALGORITHMalgorithmOSRM_IPipOSRM_PORTportNew tags
@isolated@with_(datastore|directly|mmap)@no_(datastore|directly|mmap)@with_(ci|mld)@no_(ci|mld)Tasklist
Requirements / Relations
Link any requirements here. Other pull requests this PR is based on?