Releases · Whisker17/claude-codex-loop

31 Mar 01:23

Whisker17

v2.3.1

d10b6d7

v2.3.1 Latest

Latest

Changes

Add root .claude-plugin/plugin.json manifest required for marketplace submission
Sync plugin version to 2.3.1 across all manifests
Fix old plugins/review-loop/ path references in design specs
Remove external tooling artifacts (docs/superpowers/plans/)
Update README with version history and marketplace info

Assets 2

27 Mar 04:09

Whisker17

v2.3.0

2442ef8

v2.3.0: Autooptimize Skill & Test Harness

What's New

Autooptimize Skill (`.claude/skills/autooptimize/`)

Generalized version of Karpathy's autoresearch methodology for optimizing any project artifact — prompts, configs, orchestration logic, or code. Runs autonomous experiment loops: execute → eval → mutate → keep/discard.

Test Harness (`autooptimize-harness/`)

Comprehensive test infrastructure for measuring review-loop plugin effectiveness:

Single-prompt testing — Test individual prompts (design-review, code-review, code-implement) via codex exec with fixtures containing known defects. ~1-5 min per run.
End-to-end pipeline testing — Run the full review-loop via claude -p in isolated temp dirs, then execute produced code and tests automatically.
LLM-as-judge eval — Binary pass/fail scoring using Claude as evaluator.
3 test scenarios — CLI calculator (simple), rate limiter (medium), KV store (complex).
Fixtures — Design docs, code samples, and specs with known subtle defects for evaluating review quality.

Baseline Evaluation Results

Scenario	Duration	Design Rounds	Code Rounds	Tests	Status
CLI calculator	6 min	1	1	22/22	PASS
Rate limiter	95 min	6	2	32/32	PASS
KV store	~10 hr	6	2	88/88	PASS

Key findings:

All review prompts score 100% on known-issue detection — prompt quality is already strong
E2E pipeline produces correct, tested code across all complexity levels (142 total tests passing)
Design stage convergence speed is the primary optimization target for complex tasks

Assets 2

23 Mar 06:52

Whisker17

v2.2.0

7fab8e1

review-loop v2.2.0: Independent Validation Round

What's New

Independent Validation Round

After the existing Claude-Codex review loop converges (or exhausts its 5 rounds), a fresh Codex instance with zero shared review history re-examines the output using blind-spot-focused prompts. If issues are found, they're fed back into the regular loop for fix + re-validation (max 2 cycles per stage).

This addresses the false convergence problem: Claude and Codex agree there are no remaining issues, but independent review still finds entirely new categories of problems — shared blind spots from operating under the same prompt framework.

New Modes

Mode	Role	Purpose
`independent-design-review`	Codex (READ-ONLY)	Blind-spot review of design docs
`independent-code-review`	Codex (READ-ONLY)	Blind-spot review of code changes
`validation-design-fix`	Codex (READ-ONLY)	Review design fixes after validation
`validation-fix`	Codex (implementer)	Fix code issues found by validation

Key Details

Context isolation: prompt-assembly layer injects zero review history; prompt-instruction layer explicitly prohibits reading specs/reviews/ and .claude/
Composite round tokens: c1/c2 for validation cycles, c1f1/c2f2 for fix rounds — no artifact naming collisions
Failure handling: retry once on timeout/failure, skip + log on double failure, surface in final output (never falsely claim validation passed)
Ordering: validation runs before verify round (verify remains the terminal pass)
Role separation: AGENTS.md distinguishes independent reviewers, design fix reviewers, and code fixers with distinct constraints
20 new tests (37 total), all passing

File Changes

4 new prompt templates in review-loop/prompts/
Modified: common.sh (4 new modes, composite round validation, append_diff_section with temp git index), review-loop.md (validation + verify + output flows), AGENTS.md (role separation), tests/review-loop.test.sh
No changes to run-review-bg.sh, check-review.sh, kill-review.sh, hooks, or state file schema

See specs/design.md for the full implementation design.

Assets 2

21 Mar 09:06

Whisker17

v2.1.0

8f538e7

v2.1.0

review-loop v2.1.0

Addresses two production issues discovered in v2:

Fresh Independent Reviews

Every review round now mandates a full audit of the entire document/diff
Verify rounds strip all prior review context and prepend a "FULL INDEPENDENT REVIEW" header
Eliminates attention narrowing where successive rounds focused only on previously found issues

Optional Brainstorming Stage

If superpowers:brainstorming is available, user is asked whether to brainstorm before designing
Output saved to specs/brainstorm.md as supplementary context (task description stays authoritative)
Session-scoped via brainstorm_done flag — stale artifacts from previous sessions are ignored
Brainstorming suppressed in all subsequent Codex prompts

Improved Cancellation

Records start_branch and start_sha in state file
Cancel restores starting branch/commit and deletes session branch
Handles detached HEAD correctly
git reset -- . && git checkout -- . && git clean -fd clears staged + unstaged + untracked

Other Changes

specs/brainstorm.md added to code-stage protected paths and staging exclusions
Code-stage verify is Claude-only (no Codex invocation)
Updated tests (18 total, all passing)
Updated AGENTS.md for three-stage workflow

Design Review Audit Trail

The implementation spec went through 5 rounds of Codex design review + a verification pass. All review artifacts are preserved in specs/reviews/design/.

Full Changelog: 6630b26...v2.1.0

Assets 2

Releases: Whisker17/claude-codex-loop

v2.3.1

Changes

Uh oh!

v2.3.0: Autooptimize Skill & Test Harness

What's New

Autooptimize Skill (.claude/skills/autooptimize/)

Test Harness (autooptimize-harness/)

Baseline Evaluation Results

Uh oh!

review-loop v2.2.0: Independent Validation Round

What's New

Independent Validation Round

New Modes

Key Details

File Changes

Uh oh!

v2.1.0

review-loop v2.1.0

Fresh Independent Reviews

Optional Brainstorming Stage

Improved Cancellation

Other Changes

Design Review Audit Trail

Uh oh!

Autooptimize Skill (`.claude/skills/autooptimize/`)

Test Harness (`autooptimize-harness/`)