Skip to content

Releases: Whisker17/claude-codex-loop

v2.3.1

31 Mar 01:23

Choose a tag to compare

Changes

  • Add root .claude-plugin/plugin.json manifest required for marketplace submission
  • Sync plugin version to 2.3.1 across all manifests
  • Fix old plugins/review-loop/ path references in design specs
  • Remove external tooling artifacts (docs/superpowers/plans/)
  • Update README with version history and marketplace info

v2.3.0: Autooptimize Skill & Test Harness

27 Mar 04:09

Choose a tag to compare

What's New

Autooptimize Skill (.claude/skills/autooptimize/)

Generalized version of Karpathy's autoresearch methodology for optimizing any project artifact — prompts, configs, orchestration logic, or code. Runs autonomous experiment loops: execute → eval → mutate → keep/discard.

Test Harness (autooptimize-harness/)

Comprehensive test infrastructure for measuring review-loop plugin effectiveness:

  • Single-prompt testing — Test individual prompts (design-review, code-review, code-implement) via codex exec with fixtures containing known defects. ~1-5 min per run.
  • End-to-end pipeline testing — Run the full review-loop via claude -p in isolated temp dirs, then execute produced code and tests automatically.
  • LLM-as-judge eval — Binary pass/fail scoring using Claude as evaluator.
  • 3 test scenarios — CLI calculator (simple), rate limiter (medium), KV store (complex).
  • Fixtures — Design docs, code samples, and specs with known subtle defects for evaluating review quality.

Baseline Evaluation Results

Scenario Duration Design Rounds Code Rounds Tests Status
CLI calculator 6 min 1 1 22/22 PASS
Rate limiter 95 min 6 2 32/32 PASS
KV store ~10 hr 6 2 88/88 PASS

Key findings:

  • All review prompts score 100% on known-issue detection — prompt quality is already strong
  • E2E pipeline produces correct, tested code across all complexity levels (142 total tests passing)
  • Design stage convergence speed is the primary optimization target for complex tasks

review-loop v2.2.0: Independent Validation Round

23 Mar 06:52

Choose a tag to compare

What's New

Independent Validation Round

After the existing Claude-Codex review loop converges (or exhausts its 5 rounds), a fresh Codex instance with zero shared review history re-examines the output using blind-spot-focused prompts. If issues are found, they're fed back into the regular loop for fix + re-validation (max 2 cycles per stage).

This addresses the false convergence problem: Claude and Codex agree there are no remaining issues, but independent review still finds entirely new categories of problems — shared blind spots from operating under the same prompt framework.

New Modes

Mode Role Purpose
independent-design-review Codex (READ-ONLY) Blind-spot review of design docs
independent-code-review Codex (READ-ONLY) Blind-spot review of code changes
validation-design-fix Codex (READ-ONLY) Review design fixes after validation
validation-fix Codex (implementer) Fix code issues found by validation

Key Details

  • Context isolation: prompt-assembly layer injects zero review history; prompt-instruction layer explicitly prohibits reading specs/reviews/ and .claude/
  • Composite round tokens: c1/c2 for validation cycles, c1f1/c2f2 for fix rounds — no artifact naming collisions
  • Failure handling: retry once on timeout/failure, skip + log on double failure, surface in final output (never falsely claim validation passed)
  • Ordering: validation runs before verify round (verify remains the terminal pass)
  • Role separation: AGENTS.md distinguishes independent reviewers, design fix reviewers, and code fixers with distinct constraints
  • 20 new tests (37 total), all passing

File Changes

  • 4 new prompt templates in review-loop/prompts/
  • Modified: common.sh (4 new modes, composite round validation, append_diff_section with temp git index), review-loop.md (validation + verify + output flows), AGENTS.md (role separation), tests/review-loop.test.sh
  • No changes to run-review-bg.sh, check-review.sh, kill-review.sh, hooks, or state file schema

See specs/design.md for the full implementation design.

v2.1.0

21 Mar 09:06

Choose a tag to compare

review-loop v2.1.0

Addresses two production issues discovered in v2:

Fresh Independent Reviews

  • Every review round now mandates a full audit of the entire document/diff
  • Verify rounds strip all prior review context and prepend a "FULL INDEPENDENT REVIEW" header
  • Eliminates attention narrowing where successive rounds focused only on previously found issues

Optional Brainstorming Stage

  • If superpowers:brainstorming is available, user is asked whether to brainstorm before designing
  • Output saved to specs/brainstorm.md as supplementary context (task description stays authoritative)
  • Session-scoped via brainstorm_done flag — stale artifacts from previous sessions are ignored
  • Brainstorming suppressed in all subsequent Codex prompts

Improved Cancellation

  • Records start_branch and start_sha in state file
  • Cancel restores starting branch/commit and deletes session branch
  • Handles detached HEAD correctly
  • git reset -- . && git checkout -- . && git clean -fd clears staged + unstaged + untracked

Other Changes

  • specs/brainstorm.md added to code-stage protected paths and staging exclusions
  • Code-stage verify is Claude-only (no Codex invocation)
  • Updated tests (18 total, all passing)
  • Updated AGENTS.md for three-stage workflow

Design Review Audit Trail

The implementation spec went through 5 rounds of Codex design review + a verification pass. All review artifacts are preserved in specs/reviews/design/.

Full Changelog: 6630b26...v2.1.0