Skip to content

feat(workflow): circuit breaker + step validation + explicit complete#281

Open
paralizeer wants to merge 9 commits intosnarktank:mainfrom
paralizeer:auto/feat/circuit-breaker-cron-20260305_193926
Open

feat(workflow): circuit breaker + step validation + explicit complete#281
paralizeer wants to merge 9 commits intosnarktank:mainfrom
paralizeer:auto/feat/circuit-breaker-cron-20260305_193926

Conversation

@paralizeer
Copy link

Summary

Three improvements to workflow automation reliability:

1. Circuit Breaker for Failing Cron Jobs (medic)

  • Added circuit breaker that auto-disables agent cron jobs after 5 consecutive failures
  • Tracks failure count and exposes state in workflow status
  • Prevents token waste from repeatedly failing jobs

2. Step Output Validation

  • Validates required output keys before step completion
  • Prevents workflow advancement with incomplete context (missing repo/branch, build_cmd, etc.)
  • Tightens prompt guidance for step-specific output schemas

3. Explicit Step Complete Instructions

  • Added explicit instructions to all agent AGENTS.md files (bug-fix, coding-sprint, feature-dev)
  • Ensures agents call antfarm step complete after finishing work
  • Addresses story-loop exit issue where sessions exit without completing steps

Testing

  • All changes are additive and follow existing patterns
  • No breaking changes to CLI or API

Related Issues

Auto-generated by Openclaw AutoDev

Claw and others added 9 commits March 2, 2026 01:50
Add dryRunWorkflow() function that:
- Validates workflow YAML via loadWorkflowSpec()
- Builds execution context with placeholder values
- Resolves all step input templates using resolveTemplate()
- Prints execution plan showing all steps with agent assignments
- Returns without creating DB entries or spawning crons

Update CLI to call dryRunWorkflow when --dry-run flag is passed to
'workflow run' command.

Tested with coding-sprint and bug-fix workflows.
…ntfarm workflow CLI. When passed to 'workflow run', it should validate the workflow YAML, resolve all template variables with placeholder values, print the execution plan (steps, agents, order), and exit without actually creating a run or spawning any agents. Should work for all workflows (coding-sprint, bug-fix, feature-dev, security-audit).
- Add safety reset in claimStep: if step is running but has no current_story_id, reset to pending
- Add current_story.* context keys for template usage
- Set defaults for reviewer template keys (commit, test_result)
- Add logging to checkLoopContinuation for debugging
- Update all workflow YAMLs from 'default' to 'minimax/MiniMax-M2.5'
- Add memory access to developer/planner/reviewer/tester agents
- Add new prospector workflows: eps-prospector, local-prospector, job-scout, gran-concepcion-prospector

Addresses: snarktank#272 (story loop stuck), snarktank#266 (stall after Story 1)
Auto-generated by Openclaw AutoDev
The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev
The DEFAULT_POLLING_MODEL was set to 'default' which is not a valid
model identifier for sessions_spawn. This caused agent cron jobs to
fail silently - they would fire but the sessions would not complete
because the model was invalid.

Changed both occurrences of 'default' to 'minimax/MiniMax-M2.5'
which matches the default model in OpenClaw config and the workflow YAMLs.

Fixes issue snarktank#217 - Agent cron jobs spawn sessions but work does not complete
Add validation in completeStep to check that step output contains
all required keys specified in the workflow's 'expects' field.

When a step outputs KEY: value pairs, we now validate that all keys
listed in expects are present. If any required keys are missing,
the step fails with a descriptive error message.

This prevents incomplete step output from propagating to downstream
steps and causing confusing failures later.

Issue: snarktank#270 - Workflow may accept incomplete step output and advance
with missing required context keys

Auto-generated by Openclaw AutoDev
After 5 consecutive errors, the medic now auto-disables cron jobs
to prevent wasted tokens on persistently failing jobs (issue snarktank#218).

Changes:
- gateway-api.ts: extract consecutiveErrors and lastStatus from cron list
- gateway-api.ts: add disableCronJob() function for circuit breaker action
- checks.ts: add checkFailingCrons() to detect crons exceeding error threshold
- checks.ts: add disable_cron action type
- medic.ts: handle disable_cron action to auto-disable failing cron jobs

This is part of Resilience Week - making the system handle failure
as elegantly as it handles success.

Auto-generated by Openclaw AutoDev
…NTS.md

The developer/coder/fixer agents were outputting STATUS: done but not
calling the step complete CLI, causing steps to get stuck in 'running'
state indefinitely. This happened because the polling prompt had the
instruction but the agent AGENTS.md did not.

Added explicit step complete instructions to:
- feature-dev/agents/developer/AGENTS.md
- coding-sprint/agents/coder/AGENTS.md
- bug-fix/agents/fixer/AGENTS.md

Each now includes:
- ⚠️ CRITICAL warning header
- Exact command to write output to temp file and pipe to step complete
- Explanation that session will end after this call

This should fix issue snarktank#272 where developer agent sessions exit after
each story without completing the step.

Refs: snarktank#272
@vercel
Copy link

vercel bot commented Mar 5, 2026

@paralizeer is attempting to deploy a commit to the Ryan Team on Vercel.

A member of the Team first needs to authorize it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant