Skip to content

fix(l1): restart kurtosis engine before daily snapsync jobs#6203

Open
edg-l wants to merge 1 commit intomainfrom
fix/snapsync-kurtosis-engine-restart
Open

fix(l1): restart kurtosis engine before daily snapsync jobs#6203
edg-l wants to merge 1 commit intomainfrom
fix/snapsync-kurtosis-engine-restart

Conversation

@edg-l
Copy link
Contributor

@edg-l edg-l commented Feb 13, 2026

Summary

  • The daily snapsync CI runs on a self-hosted runner (ethrex-sync) where the kurtosis engine persists across runs
  • After the CLI was bumped to 1.15.2 in fix(l1): bump kurtosis version to 1.15.2 to fix assertoor ci #6198, the engine remained at 1.10.2, causing API version mismatches
  • Adds a dedicated engine-restart job that runs kurtosis engine restart before both sync jobs

Test plan

  • Trigger a manual workflow dispatch and verify the engine-restart job succeeds
  • Verify both sync-lighthouse and sync-prysm jobs no longer fail with version mismatch

The self-hosted runner's kurtosis engine stays at the old version
across runs, causing API version mismatches when the CLI is updated.
Add a dedicated engine-restart job that both sync jobs depend on.
@edg-l edg-l requested a review from a team as a code owner February 13, 2026 12:46
@github-actions github-actions bot added the L1 Ethereum client label Feb 13, 2026
@github-actions
Copy link

🤖 Kimi Code Review

Review of PR #6203

This PR adds a new engine-restart job to the daily snapsync workflow to restart the Kurtosis Engine before running sync tests.

Issues Found:

  1. Race condition risk: The engine-restart job runs in parallel with the prepare job, but there's no guarantee the engine restart will complete before the sync jobs start. This could lead to sync jobs attempting to use the engine while it's restarting.

  2. Missing dependency: The engine-restart job should depend on prepare to ensure proper sequencing. Currently, they run in parallel which could cause issues if prepare modifies any Kurtosis-related configuration.

  3. No error handling: The kurtosis engine restart command has no error handling or retry logic. If the restart fails, the workflow will continue and the sync jobs may fail.

Suggested Changes:

engine-restart:
  name: Restart Kurtosis Engine
  runs-on: ethrex-sync
  needs: prepare  # Add this dependency
  steps:
    - name: Restart engine to match CLI version
      run: |
        # Add error handling and retry logic
        for i in {1..3}; do
          if kurtosis engine restart; then
            echo "Engine restart successful"
            sleep 5  # Give engine time to fully initialize
            exit 0
          fi
          echo "Attempt $i failed, retrying..."
          sleep 10
        done
        echo "Engine restart failed after 3 attempts"
        exit 1

Alternative Approach:

Consider moving the engine restart into the sync job steps themselves, ensuring it happens immediately before the sync test runs. This would eliminate the parallel execution concerns.


Automated review by Kimi (Moonshot AI) · custom prompt

@github-actions
Copy link

🤖 Codex Code Review

Summary
Small workflow change adding a Kurtosis engine restart job and making both sync jobs depend on it. No Rust/EVM code touched.

Findings

  1. Correctness / Reliability: The new engine-restart job has no needs dependency and will run in parallel with prepare. If the restart depends on artifacts or environment from prepare, it can race. If that dependency matters, add needs: prepare. Otherwise, consider documenting that independence in the workflow for clarity. .github/workflows/daily_snapsync.yaml:69-80
  2. Failure behavior: kurtosis engine restart failing will block both sync jobs (as intended). If you want sync to continue even when restart fails, you’d need continue-on-error: true on that step or job. As written, this is a hard gate. Confirm this is desired. .github/workflows/daily_snapsync.yaml:69-80

Ethereum-specific review
No protocol/consensus/EVM/state changes in this PR.

Overall
Looks good for a CI-only change; just confirm the intended dependency and failure semantics.

Next steps (optional)

  1. Decide whether engine-restart should depend on prepare and update needs accordingly.
  2. Confirm whether restart failure should block sync and adjust continue-on-error if not.

Automated review by OpenAI Codex · custom prompt

@github-actions
Copy link

🤖 Claude Code Review

Now I have enough context. Let me provide the code review.


PR Review: fix(l1): restart kurtosis engine before daily snapsync jobs

Overall Assessment

This is a well-targeted fix for a version mismatch issue between the Kurtosis CLI and engine. The approach is sound and the implementation is clean.

Positive Points

  1. Correct problem diagnosis: The issue correctly identifies that after bumping the CLI to 1.15.2 in PR 6198, the persistent engine on the self-hosted runner remained at 1.10.2, causing API version mismatches.

  2. Clean solution: Adding a dedicated engine-restart job that runs before the sync jobs ensures version alignment without requiring manual intervention.

  3. Proper dependency chain: Both sync-lighthouse and sync-prysm jobs now correctly depend on both prepare and engine-restart, ensuring the engine is restarted before any sync operations begin.

Concerns & Suggestions

1. Missing error handling (Medium priority)

The engine-restart job at .github/workflows/daily_snapsync.yaml:77 runs kurtosis engine restart without any error handling or verification:

- name: Restart engine to match CLI version
  run: kurtosis engine restart

Recommendation: Add error handling and verification to ensure the restart succeeded:

- name: Restart engine to match CLI version
  run: |
    kurtosis engine restart
    # Verify engine is running and responsive
    kurtosis engine status

2. Race condition potential (Low priority)

Both sync-lighthouse and sync-prysm jobs depend on engine-restart but could potentially start concurrently after it completes. Since they run on the same self-hosted runner (ethrex-sync) and share the same Kurtosis engine, this might cause contention.

Current mitigation: The workflow already has concurrency.group: ethrex-sync-server with cancel-in-progress: false, which should prevent jobs from running simultaneously. This is good.

Note: No action needed, but worth verifying the concurrency group behaves as expected with the new job structure.

3. Idempotency consideration (Informational)

The kurtosis engine restart command should be idempotent, but consider what happens if:

  • The engine is already at the correct version
  • The engine is not running at all
  • Multiple workflows trigger simultaneously (though concurrency settings should prevent this)

Recommendation: Consider checking the engine version before restarting, though this might be over-engineering for the current use case:

- name: Restart engine to match CLI version
  run: |
    echo "Current engine status:"
    kurtosis engine status || true
    echo "Restarting engine..."
    kurtosis engine restart
    echo "New engine status:"
    kurtosis engine status

This provides better logging for debugging future issues.

4. Test plan alignment (Informational)

The test plan mentions:

Trigger a manual workflow dispatch and verify the engine-restart job succeeds

Ensure the manual test actually validates that:

  • The engine version matches the CLI version after restart
  • Both sync jobs complete successfully without version mismatch errors

Minor Observations

  • The workflow structure is clean and follows GitHub Actions best practices
  • The naming is descriptive and consistent with the rest of the file
  • The dependency graph is now explicit and correct

Verdict

LGTM with suggestions. The core fix is correct and addresses the immediate problem. The suggested improvements around error handling and logging would make the workflow more robust and easier to debug, but are not blocking issues.

The change is minimal, focused, and appropriate for the problem at hand. Once the test plan is executed successfully, this should be safe to merge.


Automated review by Claude (Anthropic) · custom prompt

@greptile-apps
Copy link

greptile-apps bot commented Feb 13, 2026

Greptile Overview

Greptile Summary

Adds a dedicated engine-restart job that runs kurtosis engine restart before the sync jobs to resolve API version mismatch issues. After the Kurtosis CLI was bumped to version 1.15.2 in PR #6198, the engine on the self-hosted runner (ethrex-sync) remained at version 1.10.2, causing compatibility problems.

Key changes:

  • Introduces new engine-restart job that runs on the ethrex-sync runner
  • Updates both sync-lighthouse and sync-prysm jobs to depend on engine-restart in addition to prepare
  • Ensures the engine version matches the CLI version before running daily snapsync tests

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The change is straightforward and well-motivated. It adds a single job that restarts the Kurtosis engine to match the CLI version bumped in PR fix(l1): bump kurtosis version to 1.15.2 to fix assertoor ci #6198. The dependency chain is correctly configured with both sync jobs now depending on the engine-restart job. This is a minimal, targeted fix for a known version mismatch issue on the self-hosted runner.
  • No files require special attention

Important Files Changed

Filename Overview
.github/workflows/daily_snapsync.yaml Adds engine-restart job to synchronize Kurtosis engine version with CLI before sync jobs run

Last reviewed commit: 0614d43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

L1 Ethereum client

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants