parallel execution of a3p-integration tests #11000

turadg · 2025-02-12T23:51:33Z

What is the Problem Being Solved?

The test-docker-build job takes ~45min. In a (recent run) that was

step	time
docker build (sdk)	5m 10s
build proposal tests	9m 34s
run proposal tests	28m 31s

Within run proposal tests:

21:34:37 start
21:34:46 Running test image for proposal beta-fast-usdc
21:36:39 Running test image for proposal upgrade-next
21:44:46 Running test image for proposal acceptance
22:03:06 [z:acceptance] Testing completed.

So ~2min for beta-fast-usdc, ~8m for upgrade-next, and ~18min for acceptance.

If we parallelized those with a matrix job, it would be that slowest job (18min) instead of their sum (28min) saving 10min wall time on this workflow. Overall CI would still be bottlenecked on multichain-testing (45min currently) but re-runs due to flakes would be cheaper.

Moreover, most of z:acceptance is parallelizable because it doesn't perform any evals (it's not a real proposal). If we split that job in two, it would be 9min and still the slowest, so the overall workflow would finish in ~24min (5+9.5+9) instead of 45min (20min savings).

Doing this we'd also get more granular reports in the PR status checks, showing which tests failed.

Description of the Design

  proposals-test:
    needs: build_proposal_tests
    if: needs.pre_check.outputs.should_run == 'true'
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
      matrix:
        proposal: ${{ fromJson(needs.get-proposals.outputs.proposals) }}
    steps:
      - uses: actions/checkout@v4
      - name: Discover proposal names
        id: get-proposals
        run: |
          PROPOSALS=$(find a3p-integration/proposals -mindepth 1 -maxdepth 1 -type d -exec basename {} \; | jq -R -s -c 'split("\n")[:-1]')
          echo "proposals=$PROPOSALS" >> $GITHUB_ENV

      - name: Run proposal test
        run: yarn test -m ${{ matrix.proposal }}
        working-directory: a3p-integration

Because these checks are dynamic we'll have to solve signaling to a required job that they all passed.

Security Considerations

n/a

Scaling Considerations

n/a

Test Plan

n/a

Upgrade Considerations

n/a

The text was updated successfully, but these errors were encountered:

mhofman · 2025-02-13T00:41:22Z

I think we'll need a command to reliably list the tests layers available to matrix (or assume folder names in proposals).

We're planning to create a matrix of run tests already for the loadgen revamp work.

Splitting the acceptance layer conflicts with the goal of having a follower run against the test chain during the most of the acceptance test (excluding the genesis upgrade step, which is intrinsically incompatible). A split acceptance test would likely involve running a follower in both splits.

turadg added devex developer experience tooling repo-wide infrastructure labels Feb 12, 2025

mhofman mentioned this issue Feb 13, 2025

revamp loadgen Agoric/testnet-load-generator#117

Open

turadg mentioned this issue Feb 13, 2025

adopt Depot action runner #11003

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallel execution of a3p-integration tests #11000

parallel execution of a3p-integration tests #11000

turadg commented Feb 12, 2025

mhofman commented Feb 13, 2025

parallel execution of a3p-integration tests #11000

parallel execution of a3p-integration tests #11000

Comments

turadg commented Feb 12, 2025

What is the Problem Being Solved?

Description of the Design

Security Considerations

Scaling Considerations

Test Plan

Upgrade Considerations

mhofman commented Feb 13, 2025