Skip to content

fix(flywheel): phase-1 repairs for broken flywheel loop#41

Merged
robotlearning123 merged 16 commits intoagent-next:mainfrom
robotlearning123:pr/flywheel-phase1
Mar 5, 2026
Merged

fix(flywheel): phase-1 repairs for broken flywheel loop#41
robotlearning123 merged 16 commits intoagent-next:mainfrom
robotlearning123:pr/flywheel-phase1

Conversation

@robotlearning123
Copy link
Member

Summary

8 bug fixes from the full flywheel audit, addressing critical failures in every stage of the Plan → Execute → Merge → Learn loop:

  • Worktree: remove hardcoded v1/ from node_modules symlink; release() defaults to merged:false
  • Agent runner: meta tasks skip commit instructions and build verification
  • Pipeline: runMetaTask() helper with status checking; per-run state isolation (Map + scoped dirs); cancel aborts running tasks and cleans all Maps; N→1 DB writes per wave
  • Scheduler: review rejection and merge conflict now set task.status="failed"; cancelledTasks guard prevents overwriting failure status; mergeGate populated on all review/merge paths
  • Store: MergeGateState type + persistence via merge_gate column

Test plan

  • npx tsc --noEmit — zero errors
  • Full test suite: 384 pass / 0 fail / 3 skip
  • New tests for: meta task handling, per-run isolation, cancel+abort, mergeGate persistence, review rejection status, merge conflict status, worktree release default
  • Manual: start pipeline run, verify meta tasks skip commit instructions
  • Manual: cancel mid-pipeline, verify all running tasks aborted

🤖 Generated with Claude Code

robotlearning123 and others added 16 commits March 5, 2026 14:10
v0.1.1: Production-quality stage prompts
- Add getRepoContext() helper (file tree, language, deps detection)
- ResearchPlan: architect-grade prompt with repo awareness
- Decompose: wave ordering by import deps, self-contained tasks
- Verify: actionable errors with file paths and line numbers
- Verify→Execute feedback: generate fix tasks from errors

v0.1.2: Inter-task coherence + observability
- allowLongPrompt option in scheduler.submit()
- Wave tasks get plan context ("read .cc-pipeline/plan.md")
- verifyResults field on PipelineRun for dashboard
- markStaleRunsFailed() crash recovery on startup

v0.1.3: API polish + test coverage
- Pipeline config overrides from POST /api/pipeline
- Pipeline endpoints added to /api/docs
- 3 new tests: verify-fix grouping, crash recovery, plan context
- Per-run PipelineConfig (stage methods use cfg param)

329/329 tests pass, TSC clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add _runConfigs.delete() to drive().catch() error handler
- Refresh RepoContext before doVerify() (execute stage modifies repo)
- Cache cfg() locally in doResearchPlan and doExecute inner loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…op (v0.1.4)

P0-a: Retry with error context — inject previous error into prompt on retry
  instead of clearing it. Both auto-retry (executeAndRelease) and manual
  requeue() now append error context (capped at 500 chars).

P0-b: Wave file conflict validation — extractFilePaths() extracts file paths
  from task prompts, validateWaves() detects intra-wave file conflicts and
  moves conflicting tasks to subsequent waves. Prevents parallel agents from
  editing the same file.

P1-a: Budget-based retry loop with dead-loop detection — verify stage now
  considers three stop conditions: budget exhausted, same errors repeated
  (dead loop), or max iterations reached. Added totalBudget field to
  PipelineConfig (default $50).

Tests: 341 pass (was 329), TSC clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Task Classifier: classifyTask() pure function auto-assigns model/timeout/
budget based on prompt analysis. quick (<200 chars, ≤1 file) → haiku/120s/$1,
deep (refactor/redesign/architect, 3+ files) → opus/600s/$10, standard →
sonnet/300s/$5. Integrated into scheduler.submit() with caller-override
priority.

Model Fallback: on retry, swap agent (claude↔codex) via
AgentRunner.pickFallbackAgent() for better chance of success.

Pipeline note: task-classifier.ts, types.ts model field, and
agent-runner.ts pickFallbackAgent were created by cc-manager's own
pipeline (first successful self-hosted run). Scheduler integration,
tests, and classifier bug fixes done manually.

Tests: 357 pass (was 341), TSC clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…base, multi-dep DAG (v0.1.6)

Phase 1 Critical Path:
- Fix prompt accumulation: use _originalPrompt to rebuild from original + latest error
- Model escalation: retryCount >= 2 upgrades to opus via modelOverride
- Staged rebase: after merge, rebase all active worktrees onto new main
- Dependency DAG: check ALL deps in array, not just first element
- agent-runner respects task.modelOverride in both CLI and SDK paths

Pipeline generated: types, store, worktree-pool, tests
Manual fixes: scheduler integration, multi-dep check, prompt accumulation, rebase wiring

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… session resume, codex config

F1: Detect empty commits after agent exits — fail task instead of silent success
F2: CRITICAL commit enforcement in task prompt
F3: Post-merge working directory sync (syncMainWorktree)
F4: Complete pricing table with all 6 models (haiku, sonnet, opus, gpt-5.4, gpt-5.4-wide, o4-mini)
F5: Capture sessionId from Claude stream-json, --resume on retry
F6: --json-schema structured output for review agent
F7: Codex GPT-5.4 routing for deep/integration tasks, classifier outputs agent+contextProfile
F9: Codex config.toml profile management (default + wide 1M context)

372 tests pass, TSC clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- docs/STRATEGY.md: competitive landscape, four pillars, borrowed patterns
- docs/3-agents-reference.md: Claude CLI, SDK, Codex features + gaps
- docs/research/: agent landscape, model pricing, NeurIPS findings
- docs/plans/: v0.1.6, v0.1.7, v0.2 implementation plans
- docs/ROADMAP.md, GAP-ANALYSIS.md, COMPETITIVE-ANALYSIS.md, etc.
- .cc-pipeline/: pipeline artifacts from self-hosting runs

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- CLAUDE.md: v0.1.0→v0.1.7, 282→372 tests, add pipeline modules, honest flywheel status (NOT WORKING)
- CONFIGURATION.md: claude-opus-4-5→claude-opus-4-6
- ROADMAP.md: update current state to v0.1.7, mark completed features, honest assessment

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
8 bug fixes from the full flywheel audit (B1-B8):

- fix(worktree): remove hardcoded v1/ from node_modules symlink (P0-W1)
- fix(worktree): release() defaults to merged:false (P0-W2)
- fix(agent-runner): meta tasks skip commit instructions and build verify (P0-A1)
- fix(pipeline): ensureMetaTaskSucceeded / runMetaTask helper (P0-A2)
- fix(scheduler): review rejection and merge conflict set status="failed" (P0-S1)
- fix(scheduler): populate mergeGate on all review/merge paths
- fix(pipeline): cancel aborts running tasks, cleans all Maps (P0-S2)
- fix(pipeline): per-run state isolation with Maps + scoped dirs (P0-S3)

Additional quality fixes from simplify review:
- cancelledTasks guard: don't overwrite "failed" with "cancelled"
- N→1 DB writes: pipelineStore.save per-wave not per-task
- Extract runMetaTask() to deduplicate 3x meta-task pattern
- MergeGateState type + persistence in store

384 tests pass, 0 fail, TSC clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GitHub Actions runners lack git user.name/email config,
causing `git commit --allow-empty` to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@robotlearning123 robotlearning123 merged commit 4fe7212 into agent-next:main Mar 5, 2026
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant