Context
Analysis of an 18-round RLCR session (3 mainline + 15 review rounds) that produced a high-quality result but exhibited an efficiency gap in the review phase. The external reviewer was the session's highest-value component (zero false positives, caught 5 blocking defects missed by standard tests), but the one-finding-per-round cadence inflated the round count.
Findings
1. Batch Review Findings Per Round (High Impact)
Pattern: Each review round produced 1-3 findings, but the worker addressed only one per round, leading to a 1:1 round-to-issue ratio for 15 consecutive rounds.
Suggestion: When a review produces N findings, the next round's contract should enumerate all N as a checklist. The current contract scoping appears to default to a single mainline objective. Making batching the default could reduce review-phase rounds by ~60%.
2. Add End-to-End Walkthrough Acceptance Criteria (Medium Impact)
Pattern: Component-level ACs (API responds, tests pass, build succeeds) passed, but 15 post-completion rounds found user-journey issues: root route 404, dollar/cents form mismatch, navigation dead-ends, cross-service cookie collision.
Suggestion: Introduce a complementary AC type that verifies user-journey-level correctness: given the instruction text, can an agent following the described steps reach the correct end state? This would catch integration and UX issues that component tests miss.
3. Multi-Pass Review Within Rounds (Medium Impact)
Pattern: The reviewer found issues layer-by-layer (large first, then medium, then small), with each layer requiring its own round.
Suggestion: Allow the reviewer to perform 2-3 passes within one round. After finding a P2 issue in the first pass, the reviewer could immediately re-examine the fixed code for adjacent issues, collapsing 2-3 rounds into 1 for closely related findings.
4. Known Risk Surface in Plans (Low Impact)
Pattern: 15 rounds of correctness fixes after ACs passed, all in predictable categories (cross-service auth, form/API parity, verifier strictness, navigation, registration).
Suggestion: Plans could include a "known risk surface" section enumerating likely integration issue categories, enabling workers to proactively self-audit before the reviewer finds them.
5. BitLesson Trigger Calibration (Low Impact)
Pattern: The BitLesson field was NONE in every round of an 18-round session, suggesting the lesson capture mechanism is not triggering for review-phase rounds.
Suggestion: Verify that lesson capture triggers are calibrated for sessions dominated by correctness fixes rather than new feature development.
Summary
| # |
Improvement |
Expected Impact |
| 1 |
Batch review findings per round |
~60% reduction in review-phase rounds |
| 2 |
End-to-end walkthrough ACs |
Catch integration issues before review phase |
| 3 |
Multi-pass review within rounds |
Collapse 2-3 related rounds into 1 |
| 4 |
Known risk surface in plans |
Proactive self-audit reduces round count |
| 5 |
BitLesson trigger calibration |
Prevent lesson mechanism from becoming a no-op |
Context
Analysis of an 18-round RLCR session (3 mainline + 15 review rounds) that produced a high-quality result but exhibited an efficiency gap in the review phase. The external reviewer was the session's highest-value component (zero false positives, caught 5 blocking defects missed by standard tests), but the one-finding-per-round cadence inflated the round count.
Findings
1. Batch Review Findings Per Round (High Impact)
Pattern: Each review round produced 1-3 findings, but the worker addressed only one per round, leading to a 1:1 round-to-issue ratio for 15 consecutive rounds.
Suggestion: When a review produces N findings, the next round's contract should enumerate all N as a checklist. The current contract scoping appears to default to a single mainline objective. Making batching the default could reduce review-phase rounds by ~60%.
2. Add End-to-End Walkthrough Acceptance Criteria (Medium Impact)
Pattern: Component-level ACs (API responds, tests pass, build succeeds) passed, but 15 post-completion rounds found user-journey issues: root route 404, dollar/cents form mismatch, navigation dead-ends, cross-service cookie collision.
Suggestion: Introduce a complementary AC type that verifies user-journey-level correctness: given the instruction text, can an agent following the described steps reach the correct end state? This would catch integration and UX issues that component tests miss.
3. Multi-Pass Review Within Rounds (Medium Impact)
Pattern: The reviewer found issues layer-by-layer (large first, then medium, then small), with each layer requiring its own round.
Suggestion: Allow the reviewer to perform 2-3 passes within one round. After finding a P2 issue in the first pass, the reviewer could immediately re-examine the fixed code for adjacent issues, collapsing 2-3 rounds into 1 for closely related findings.
4. Known Risk Surface in Plans (Low Impact)
Pattern: 15 rounds of correctness fixes after ACs passed, all in predictable categories (cross-service auth, form/API parity, verifier strictness, navigation, registration).
Suggestion: Plans could include a "known risk surface" section enumerating likely integration issue categories, enabling workers to proactively self-audit before the reviewer finds them.
5. BitLesson Trigger Calibration (Low Impact)
Pattern: The BitLesson field was NONE in every round of an 18-round session, suggesting the lesson capture mechanism is not triggering for review-phase rounds.
Suggestion: Verify that lesson capture triggers are calibrated for sessions dominated by correctness fixes rather than new feature development.
Summary