Skip to content

fix(loop): validate step output before marking loop complete#6

Merged
paralizeer merged 8 commits intomainfrom
auto/fix/270-validate-output-on-loop-complete
Mar 7, 2026
Merged

fix(loop): validate step output before marking loop complete#6
paralizeer merged 8 commits intomainfrom
auto/fix/270-validate-output-on-loop-complete

Conversation

@paralizeer
Copy link
Copy Markdown
Owner

When a loop step completes all stories, it previously marked the step as 'done' without validating that required output keys were present. This could leave the workflow with incomplete context (e.g., missing repo/branch) causing downstream failures.

Now validates the step output against the 'expects' schema before marking the loop complete. If validation fails, the step and run are marked as failed with a descriptive error message.

Fixes: snarktank#270

Auto-generated by Openclaw AutoDev

@paralizeer paralizeer force-pushed the auto/fix/270-validate-output-on-loop-complete branch from 9ad7725 to a45d19c Compare March 7, 2026 05:46
The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev
The DEFAULT_POLLING_MODEL was set to 'default' which is not a valid
model identifier for sessions_spawn. This caused agent cron jobs to
fail silently - they would fire but the sessions would not complete
because the model was invalid.

Changed both occurrences of 'default' to 'minimax/MiniMax-M2.5'
which matches the default model in OpenClaw config and the workflow YAMLs.

Fixes issue snarktank#217 - Agent cron jobs spawn sessions but work does not complete
Add validation in completeStep to check that step output contains
all required keys specified in the workflow's 'expects' field.

When a step outputs KEY: value pairs, we now validate that all keys
listed in expects are present. If any required keys are missing,
the step fails with a descriptive error message.

This prevents incomplete step output from propagating to downstream
steps and causing confusing failures later.

Issue: snarktank#270 - Workflow may accept incomplete step output and advance
with missing required context keys

Auto-generated by Openclaw AutoDev
After 5 consecutive errors, the medic now auto-disables cron jobs
to prevent wasted tokens on persistently failing jobs (issue snarktank#218).

Changes:
- gateway-api.ts: extract consecutiveErrors and lastStatus from cron list
- gateway-api.ts: add disableCronJob() function for circuit breaker action
- checks.ts: add checkFailingCrons() to detect crons exceeding error threshold
- checks.ts: add disable_cron action type
- medic.ts: handle disable_cron action to auto-disable failing cron jobs

This is part of Resilience Week - making the system handle failure
as elegantly as it handles success.

Auto-generated by Openclaw AutoDev
…NTS.md

The developer/coder/fixer agents were outputting STATUS: done but not
calling the step complete CLI, causing steps to get stuck in 'running'
state indefinitely. This happened because the polling prompt had the
instruction but the agent AGENTS.md did not.

Added explicit step complete instructions to:
- feature-dev/agents/developer/AGENTS.md
- coding-sprint/agents/coder/AGENTS.md
- bug-fix/agents/fixer/AGENTS.md

Each now includes:
- ⚠️ CRITICAL warning header
- Exact command to write output to temp file and pipe to step complete
- Explanation that session will end after this call

This should fix issue snarktank#272 where developer agent sessions exit after
each story without completing the step.

Refs: snarktank#272
The completeStep function referenced 'sessionKey' which is not a
parameter of this function. Fixed by:
1. Adding session_key to the step SELECT query
2. Using step.session_key to preserve the existing session key

This bug was causing TypeScript compilation failures.

Auto-generated by Openclaw AutoDev
When a loop step completes all stories, it previously marked the step
as 'done' without validating that required output keys were present.
This could leave the workflow with incomplete context (e.g., missing
repo/branch) causing downstream failures.

Now validates the step output against the 'expects' schema before
marking the loop complete. If validation fails, the step and run
are marked as failed with a descriptive error message.

Fixes: snarktank#270

Auto-generated by Openclaw AutoDev
Auto-generated by Openclaw AutoDev
@paralizeer paralizeer force-pushed the auto/fix/270-validate-output-on-loop-complete branch from a45d19c to c092596 Compare March 7, 2026 05:49
@paralizeer paralizeer merged commit 905bc8f into main Mar 7, 2026
1 check failed
@paralizeer paralizeer deleted the auto/fix/270-validate-output-on-loop-complete branch March 7, 2026 05:49
paralizeer added a commit that referenced this pull request Mar 8, 2026
* sprint: Add --dry-run flag parsing in CLI

* sprint: Implement dry-run logic in run.ts

Add dryRunWorkflow() function that:
- Validates workflow YAML via loadWorkflowSpec()
- Builds execution context with placeholder values
- Resolves all step input templates using resolveTemplate()
- Prints execution plan showing all steps with agent assignments
- Returns without creating DB entries or spawning crons

Update CLI to call dryRunWorkflow when --dry-run flag is passed to
'workflow run' command.

Tested with coding-sprint and bug-fix workflows.

* fix(story-loop): add safety reset for stuck story iterations

- Add safety reset in claimStep: if step is running but has no current_story_id, reset to pending
- Add current_story.* context keys for template usage
- Set defaults for reviewer template keys (commit, test_result)
- Add logging to checkLoopContinuation for debugging
- Update all workflow YAMLs from 'default' to 'minimax/MiniMax-M2.5'
- Add memory access to developer/planner/reviewer/tester agents
- Add new prospector workflows: eps-prospector, local-prospector, job-scout, gran-concepcion-prospector

Addresses: snarktank#272 (story loop stuck), snarktank#266 (stall after Story 1)
Auto-generated by Openclaw AutoDev

* fix(tests): update polling model tests to match workflow YAML

The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev

* chore: remove orphaned backup file

Auto-generated by Openclaw AutoDev

* fix(tests): update expected default model to minimax/MiniMax-M2.5

* chore: add test and typecheck npm scripts

- Added 'test' script to run Node.js built-in test runner
- Added 'typecheck' script for TypeScript type checking
- Enables npm test && npm run typecheck for CI/CD

Auto-generated by Openclaw AutoDev

* fix(developer): add explicit step complete instructions to AGENTS.md

The developer agent was exiting sessions without calling 'antfarm step
complete', causing steps to get stuck in 'running' state for 30+ minutes
until Medic reset them.

This fix adds explicit, highlighted instructions to the developer's
AGENTS.md emphasizing that:
1. step complete MUST be called after finishing work
2. Provides the exact command syntax to use
3. Explains that a fresh session will handle the next story

Fixes: snarktank#272

* fix(medic): use minimax model and simplify prompt

- Use minimax/MiniMax-M2.5 instead of 'default' model
- Simplify prompt to reduce token usage
- Make HEARTBEAT_OK response more explicit

Auto-generated by Openclaw AutoDev

* ci: add GitHub Actions CI workflow for test/typecheck/build

Adds a GitHub Actions workflow that runs:
- npm run typecheck (TypeScript validation)
- npm test (all 162 tests)
- npm run build

This provides an alternative CI check to Vercel for PR validation.

Auto-generated by Openclaw AutoDev

* fix(loop): validate step output before marking loop complete (#6)

* fix(tests): update polling model tests to match workflow YAML

The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev

* fix(agent-cron): use valid model for polling instead of 'default'

The DEFAULT_POLLING_MODEL was set to 'default' which is not a valid
model identifier for sessions_spawn. This caused agent cron jobs to
fail silently - they would fire but the sessions would not complete
because the model was invalid.

Changed both occurrences of 'default' to 'minimax/MiniMax-M2.5'
which matches the default model in OpenClaw config and the workflow YAMLs.

Fixes issue snarktank#217 - Agent cron jobs spawn sessions but work does not complete

* fix(step-ops): validate required output keys before step completion

Add validation in completeStep to check that step output contains
all required keys specified in the workflow's 'expects' field.

When a step outputs KEY: value pairs, we now validate that all keys
listed in expects are present. If any required keys are missing,
the step fails with a descriptive error message.

This prevents incomplete step output from propagating to downstream
steps and causing confusing failures later.

Issue: snarktank#270 - Workflow may accept incomplete step output and advance
with missing required context keys

Auto-generated by Openclaw AutoDev

* feat(medic): add circuit breaker for failing cron jobs

After 5 consecutive errors, the medic now auto-disables cron jobs
to prevent wasted tokens on persistently failing jobs (issue snarktank#218).

Changes:
- gateway-api.ts: extract consecutiveErrors and lastStatus from cron list
- gateway-api.ts: add disableCronJob() function for circuit breaker action
- checks.ts: add checkFailingCrons() to detect crons exceeding error threshold
- checks.ts: add disable_cron action type
- medic.ts: handle disable_cron action to auto-disable failing cron jobs

This is part of Resilience Week - making the system handle failure
as elegantly as it handles success.

Auto-generated by Openclaw AutoDev

* fix(agents): add explicit step complete instructions to all agent AGENTS.md

The developer/coder/fixer agents were outputting STATUS: done but not
calling the step complete CLI, causing steps to get stuck in 'running'
state indefinitely. This happened because the polling prompt had the
instruction but the agent AGENTS.md did not.

Added explicit step complete instructions to:
- feature-dev/agents/developer/AGENTS.md
- coding-sprint/agents/coder/AGENTS.md
- bug-fix/agents/fixer/AGENTS.md

Each now includes:
- ⚠️ CRITICAL warning header
- Exact command to write output to temp file and pipe to step complete
- Explanation that session will end after this call

This should fix issue snarktank#272 where developer agent sessions exit after
each story without completing the step.

Refs: snarktank#272

* fix(step-ops): use existing session_key instead of undefined variable

The completeStep function referenced 'sessionKey' which is not a
parameter of this function. Fixed by:
1. Adding session_key to the step SELECT query
2. Using step.session_key to preserve the existing session key

This bug was causing TypeScript compilation failures.

Auto-generated by Openclaw AutoDev

* fix(loop): validate step output before marking loop complete

When a loop step completes all stories, it previously marked the step
as 'done' without validating that required output keys were present.
This could leave the workflow with incomplete context (e.g., missing
repo/branch) causing downstream failures.

Now validates the step output against the 'expects' schema before
marking the loop complete. If validation fails, the step and run
are marked as failed with a descriptive error message.

Fixes: snarktank#270

Auto-generated by Openclaw AutoDev

* chore: remove orphaned backup file

Auto-generated by Openclaw AutoDev

* fix(agent-cron): remove leftover merge conflict marker

The build was failing due to a leftover merge conflict marker
in agent-cron.ts (line 128). Removed the conflict marker and
verified typecheck and all 162 tests pass.

Auto-generated by Openclaw AutoDev

---------

Co-authored-by: Claw <claw@openclaw>
paralizeer added a commit that referenced this pull request Mar 8, 2026
…#10)

* sprint: Add --dry-run flag parsing in CLI

* sprint: Implement dry-run logic in run.ts

Add dryRunWorkflow() function that:
- Validates workflow YAML via loadWorkflowSpec()
- Builds execution context with placeholder values
- Resolves all step input templates using resolveTemplate()
- Prints execution plan showing all steps with agent assignments
- Returns without creating DB entries or spawning crons

Update CLI to call dryRunWorkflow when --dry-run flag is passed to
'workflow run' command.

Tested with coding-sprint and bug-fix workflows.

* fix(story-loop): add safety reset for stuck story iterations

- Add safety reset in claimStep: if step is running but has no current_story_id, reset to pending
- Add current_story.* context keys for template usage
- Set defaults for reviewer template keys (commit, test_result)
- Add logging to checkLoopContinuation for debugging
- Update all workflow YAMLs from 'default' to 'minimax/MiniMax-M2.5'
- Add memory access to developer/planner/reviewer/tester agents
- Add new prospector workflows: eps-prospector, local-prospector, job-scout, gran-concepcion-prospector

Addresses: snarktank#272 (story loop stuck), snarktank#266 (stall after Story 1)
Auto-generated by Openclaw AutoDev

* fix(tests): update polling model tests to match workflow YAML

The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev

* chore: remove orphaned backup file

Auto-generated by Openclaw AutoDev

* fix(tests): update expected default model to minimax/MiniMax-M2.5

* chore: add test and typecheck npm scripts

- Added 'test' script to run Node.js built-in test runner
- Added 'typecheck' script for TypeScript type checking
- Enables npm test && npm run typecheck for CI/CD

Auto-generated by Openclaw AutoDev

* fix(developer): add explicit step complete instructions to AGENTS.md

The developer agent was exiting sessions without calling 'antfarm step
complete', causing steps to get stuck in 'running' state for 30+ minutes
until Medic reset them.

This fix adds explicit, highlighted instructions to the developer's
AGENTS.md emphasizing that:
1. step complete MUST be called after finishing work
2. Provides the exact command syntax to use
3. Explains that a fresh session will handle the next story

Fixes: snarktank#272

* fix(medic): use minimax model and simplify prompt

- Use minimax/MiniMax-M2.5 instead of 'default' model
- Simplify prompt to reduce token usage
- Make HEARTBEAT_OK response more explicit

Auto-generated by Openclaw AutoDev

* ci: add GitHub Actions CI workflow for test/typecheck/build

Adds a GitHub Actions workflow that runs:
- npm run typecheck (TypeScript validation)
- npm test (all 162 tests)
- npm run build

This provides an alternative CI check to Vercel for PR validation.

Auto-generated by Openclaw AutoDev

* fix(loop): validate step output before marking loop complete (#6)

* fix(tests): update polling model tests to match workflow YAML

The workflow YAMLs were updated to use 'minimax/MiniMax-M2.5' instead
of 'default' (commit 021244b), but the tests still expected 'default'.
This caused 4 test failures in the polling configuration tests.

Updated test expectations in:
- tests/bug-fix-polling.test.ts
- tests/feature-dev-polling.test.ts
- tests/security-audit-polling.test.ts
- tests/polling-timeout-sync.test.ts

Auto-generated by Openclaw AutoDev

* fix(agent-cron): use valid model for polling instead of 'default'

The DEFAULT_POLLING_MODEL was set to 'default' which is not a valid
model identifier for sessions_spawn. This caused agent cron jobs to
fail silently - they would fire but the sessions would not complete
because the model was invalid.

Changed both occurrences of 'default' to 'minimax/MiniMax-M2.5'
which matches the default model in OpenClaw config and the workflow YAMLs.

Fixes issue snarktank#217 - Agent cron jobs spawn sessions but work does not complete

* fix(step-ops): validate required output keys before step completion

Add validation in completeStep to check that step output contains
all required keys specified in the workflow's 'expects' field.

When a step outputs KEY: value pairs, we now validate that all keys
listed in expects are present. If any required keys are missing,
the step fails with a descriptive error message.

This prevents incomplete step output from propagating to downstream
steps and causing confusing failures later.

Issue: snarktank#270 - Workflow may accept incomplete step output and advance
with missing required context keys

Auto-generated by Openclaw AutoDev

* feat(medic): add circuit breaker for failing cron jobs

After 5 consecutive errors, the medic now auto-disables cron jobs
to prevent wasted tokens on persistently failing jobs (issue snarktank#218).

Changes:
- gateway-api.ts: extract consecutiveErrors and lastStatus from cron list
- gateway-api.ts: add disableCronJob() function for circuit breaker action
- checks.ts: add checkFailingCrons() to detect crons exceeding error threshold
- checks.ts: add disable_cron action type
- medic.ts: handle disable_cron action to auto-disable failing cron jobs

This is part of Resilience Week - making the system handle failure
as elegantly as it handles success.

Auto-generated by Openclaw AutoDev

* fix(agents): add explicit step complete instructions to all agent AGENTS.md

The developer/coder/fixer agents were outputting STATUS: done but not
calling the step complete CLI, causing steps to get stuck in 'running'
state indefinitely. This happened because the polling prompt had the
instruction but the agent AGENTS.md did not.

Added explicit step complete instructions to:
- feature-dev/agents/developer/AGENTS.md
- coding-sprint/agents/coder/AGENTS.md
- bug-fix/agents/fixer/AGENTS.md

Each now includes:
- ⚠️ CRITICAL warning header
- Exact command to write output to temp file and pipe to step complete
- Explanation that session will end after this call

This should fix issue snarktank#272 where developer agent sessions exit after
each story without completing the step.

Refs: snarktank#272

* fix(step-ops): use existing session_key instead of undefined variable

The completeStep function referenced 'sessionKey' which is not a
parameter of this function. Fixed by:
1. Adding session_key to the step SELECT query
2. Using step.session_key to preserve the existing session key

This bug was causing TypeScript compilation failures.

Auto-generated by Openclaw AutoDev

* fix(loop): validate step output before marking loop complete

When a loop step completes all stories, it previously marked the step
as 'done' without validating that required output keys were present.
This could leave the workflow with incomplete context (e.g., missing
repo/branch) causing downstream failures.

Now validates the step output against the 'expects' schema before
marking the loop complete. If validation fails, the step and run
are marked as failed with a descriptive error message.

Fixes: snarktank#270

Auto-generated by Openclaw AutoDev

* chore: remove orphaned backup file

Auto-generated by Openclaw AutoDev

* fix(agent-cron): remove leftover merge conflict marker

The build was failing due to a leftover merge conflict marker
in agent-cron.ts (line 128). Removed the conflict marker and
verified typecheck and all 162 tests pass.

Auto-generated by Openclaw AutoDev

* feat(reviewer): add bot review comment handling (issue snarktank#139)

The reviewer agent now checks for automated review comments from
bots like Copilot, Gemini Code Assist, Dependabot, etc. before
making approval decisions.

If bots flagged issues, the reviewer:
1. Fixes the issues in code
2. Commits and pushes the fixes
3. Re-reviews after fixes
4. Then proceeds with approval or requests more changes

This completes the automation for issue snarktank#139 - the reviewer now
handles both bot comments AND auto-merges after approval.

Auto-generated by Openclaw AutoDev

---------

Co-authored-by: Claw <claw@openclaw>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Workflow may accept incomplete step output and advance with missing required context keys

1 participant