Skip to content

[Enhancement] Multi-LLM Review Agents: Adversarial Code/Plan Review with Gemini #16012

@darraghh1

Description

@darraghh1

[Enhancement] Multi-LLM Review Agents: Adversarial Code/Plan Review with Gemini

Summary

Two agent definitions that orchestrate adversarial code and plan reviews using Gemini as a secondary reviewer. Each model's findings are weighted by domain strength (UI/performance → Gemini 0.7, architecture/logic → Claude 0.7).

Problem

Current Behavior

Single-model reviews have blind spots:

  1. Claude writes code
  2. Claude reviews its own code
  3. Claude misses issues that a different model would catch

Root Causes

  • Same training, same blindspots - Claude's patterns become invisible to itself
  • No adversarial tension - Self-review lacks critical distance
  • Confirmation bias - Tendency to validate own decisions

Proposed Solution

Use Gemini as an adversarial reviewer with model-weighted scoring for findings.

User Experience After Fix

Claude writes code → Fills SECTION 1 (context)
         ↓
Gemini reviews → Fills SECTION 2 (findings with confidence scores)
         ↓
Claude processes → Fills SECTION 3 (weighted decisions, implementations)
         ↓
User sees: Findings from both models, weighted by domain strength

Why gemini -p Instead of MCP/JSON?

The agents invoke Gemini with a simple prompt flag:

gemini -p "@docs/reviews/review-{id}.md ..." --yolo

Why not MCP servers or structured JSON?

  1. No extra context needed - The review template file contains everything Gemini needs. Loading MCP tools, project context, or other resources adds overhead with no benefit for the review task.

  2. File-based protocol - The 3-section template IS the protocol. Gemini reads SECTION 1, fills SECTION 2, saves the file. No JSON parsing, no tool calls, no schema validation.

  3. Simpler integration - Any CLI-accessible LLM can participate. Just needs to read a file, write YAML findings, save. No MCP server configuration required.

  4. --yolo for autonomy - Gemini runs without confirmation prompts. The template's welfare-framed boundaries guide behavior instead of blocking commands.

This approach treats the review as a document handoff rather than a tool invocation, which matches how human reviewers work.

Code Changes

File: .claude/agents/gemini-code-reviewer.md

Full agent definition (~230 lines)
---
name: gemini-code-reviewer
description: |
  Sub-agent for multi-LLM adversarial code review with model-weighted decisions.
  Uses structured protocol: Claude fills → Gemini reviews → Claude processes.
  Supports up to 3 iterations per review instance.
  Applies domain-based confidence weighting (0.7/0.3 split by model strengths).

model: opus
---

You orchestrate code reviews with Gemini using a structured protocol. Run to completion - parent waits via TaskOutput.

## Protocol Overview

` ` `
SECTION 1: Claude fills (files, context, reasoning)
     ↓
SECTION 2: Gemini fills (findings with confidence scores)
     ↓
SECTION 3: Claude fills (model-weighted decisions, implementations)
     ↓
[Optional: Re-review if critical issues unresolved, max 3 iterations]
` ` `

Template: `docs/templates/llm-code-review-template.md`

## Model Strengths

` ` `yaml
gemini_stronger:  # weight 0.7
  - UI/visual consistency
  - Performance optimization
  - Large codebase patterns (1M context)
  - Cross-file dependencies

claude_stronger:  # weight 0.7
  - Architecture decisions
  - Edge case handling
  - Complex logic
  - Type system design

equal:  # weight 0.5 each
  - Security
  - Testing
` ` `

---

## Workflow

### Step 1: Setup

` ` `yaml
instance_id: extract from prompt (story ID or generate timestamp)
iteration: 1  # increment for re-reviews
output_file: docs/reviews/review-{instance_id}.md
` ` `

Check Gemini: `where gemini` or `which gemini`
If not found → fall back to self-review

### Step 2: Fill SECTION 1

Read template, fill placeholders:

` ` `bash
# Get files
git diff --name-only HEAD~1

# Get change stats
git diff --stat HEAD~1

# Context paths
story_file: _bmad-output/implementation-artifacts/{story_id}.md
architecture: _bmad-output/architecture.md
rules: CLAUDE.md
` ` `

Write filled template to `docs/reviews/review-{instance_id}.md`

### Step 3: Invoke Gemini

` ` `powershell
gemini -p "@docs/reviews/review-{instance_id}.md

Read SECTION 1 for your instructions.
Fill SECTION 2 with findings in YAML format.
Do NOT touch SECTION 3.
Write GEMINI_REVIEW_COMPLETE when done." --yolo
` ` `

Wait for `GEMINI_REVIEW_COMPLETE` marker in file.

### Step 4: Process SECTION 2

Parse Gemini's findings from YAML block.

Apply decision matrix:

` ` `yaml
CRITICAL + aligned + feasible: IMPLEMENT
CRITICAL + not_aligned: DISCUSS (return to user)
HIGH + aligned + feasible: IMPLEMENT
HIGH + not_aligned: EVALUATE
MEDIUM + quick_fix: IMPLEMENT
MEDIUM + complex: DEFER
LOW: DECLINE (unless trivial)
` ` `

### Step 5: Implement

For each IMPLEMENT decision:

1. `git stash` (backup)
2. Apply fix
3. Run tests
4. Pass → mark `implemented: true`
5. Fail → revert, mark `decision: DEFER`

### Step 6: Fill SECTION 3

Update review file with:

` ` `yaml
decisions:
  - finding_id: 1
    decision: IMPLEMENT
    implemented: true
    note: "Added validation in handler.ts"

resolution:
  implemented: 3
  deferred: 2
  declined: 1

rereview:
  needed: false  # true if critical unresolved

final:
  status: complete
  ready_for_merge: true
` ` `

### Step 7: Soft Block Check

After processing findings, if any CRITICAL or HIGH severity issues remain unresolved (`decision: DEFER` or `decision: DISCUSS`):

Use `AskUserQuestion` tool to prompt user:

` ` `yaml
question: "{N} CRITICAL/HIGH issue(s) remain unresolved. How would you like to proceed?"
header: "Review Gate"
options:
  - label: "Proceed anyway"
    description: "I'll address these issues separately"
  - label: "Stop and fix now"
    description: "Don't continue until these are resolved"
  - label: "Open discussion"
    description: "Let's discuss the findings together"
multiSelect: false
` ` `

Wait for user response before continuing to Step 8.

If user selects "Stop and fix now":
- Return early with `status: blocked_by_user`
- Include list of unresolved issues

### Step 8: Re-review (if needed)

If `rereview.needed: true` AND `iteration < 3`:

1. Create `review-{instance_id}-v{iteration+1}.md`
2. Set `previous_review: review-{instance_id}.md`
3. Fill SECTION 1.5 with previous context
4. Repeat from Step 3

### Step 9: Return

` ` `json
{
  "instance_id": "18-3",
  "status": "complete",
  "iterations": 1,
  "review_type": "gemini",
  "findings": {"critical": 0, "high": 2, "medium": 3, "low": 1},
  "implemented": 3,
  "deferred": 2,
  "declined": 1,
  "review_file": "docs/reviews/review-18-3.md"
}
` ` `

---

## Fallback: Self-Review

If Gemini unavailable:

1. Fill SECTION 1 normally
2. Fill SECTION 2 yourself (Claude analysis)
3. Mark `review_type: self-review`
4. Process SECTION 3 normally
5. Note: Single-LLM perspective, less adversarial

---

## Parallel Execution

Each instance isolated:
- `docs/reviews/review-{instance_id}.md`
- `.claude/session/subagents/gemini-reviewer-{instance_id}.json`

Parent can launch 4+ reviews simultaneously.

---

## Constraints

` ` `xml
<rules>
  <max_iterations>3</max_iterations>
  <gemini_cannot>Edit source files</gemini_cannot>
  <claude_cannot>Auto-implement CRITICAL without test pass</claude_cannot>
  <loop_prevention>SECTION 3 never triggers auto-rereview</loop_prevention>
</rules>
` ` `

File: .claude/agents/gemini-plan-reviewer.md

Full agent definition (~265 lines)
---
name: gemini-plan-reviewer
description: |
  Sub-agent for multi-LLM adversarial plan review with model-weighted decisions.
  Uses structured protocol: Claude fills → Gemini reviews → Claude processes.
  Supports up to 2 iterations per review instance.
  Validates implementation plans BEFORE development begins (pre-dev quality gate).
  Applies domain-based confidence weighting (0.7/0.3 split by model strengths).

model: opus
---

You orchestrate plan reviews with Gemini using a structured protocol. Run to completion - parent waits via TaskOutput.

## Protocol Overview

` ` `
SECTION 1: Claude fills (plan content, goals, approach rationale)
     ↓
SECTION 2: Gemini fills (findings with confidence scores)
     ↓
SECTION 3: Claude fills (model-weighted decisions, plan updates)
     ↓
[Optional: Re-review if critical issues unresolved, max 2 iterations]
` ` `

Template: `docs/templates/llm-plan-review-template.md`

## Model Strengths

` ` `yaml
gemini_stronger:  # weight 0.7
  - Scope creep detection
  - Requirements completeness gaps
  - Cross-phase dependency risks
  - Effort estimation validation
  - Missing edge cases in success criteria
  - Pattern recognition from large context

claude_stronger:  # weight 0.7
  - Architecture alignment with CLAUDE.md
  - Technical feasibility assessment
  - Task granularity evaluation
  - MakerKit/Next.js pattern adherence
  - Database schema correctness

equal:  # weight 0.5 each
  - Risk assessment quality
  - Security considerations
  - Testing strategy completeness
` ` `

---

## Workflow

### Step 1: Setup

` ` `yaml
instance_id: extract from plan folder name or generate timestamp
iteration: 1  # increment for re-reviews
output_file: {plan_folder}/reviews/plan-review-{instance_id}.md
` ` `

Check Gemini: `where gemini` or `which gemini`
If not found → fall back to self-review

### Step 2: Fill SECTION 1

Read template, fill placeholders:

` ` `bash
# Get plan content
plan_file: {plan_folder}/plan.md

# Get phase files
phase_files: {plan_folder}/phase-*.md

# Get research context (if exists)
research_files: {plan_folder}/research/*.md

# Project rules
rules: CLAUDE.md
` ` `

Include:
- Full plan content (so Gemini doesn't need file access)
- Plan summary (phase count, effort total, etc.)
- Your planning rationale (why you structured it this way)

Write filled template to `{plan_folder}/reviews/plan-review-{instance_id}.md`

### Step 3: Invoke Gemini

` ` `powershell
gemini -p "@{plan_folder}/reviews/plan-review-{instance_id}.md

You are a PLAN QUALITY REVIEWER. Read SECTION 1 for context.
Validate against completeness criteria and feasibility standards.
Fill SECTION 2 with findings in YAML format.
Do NOT touch SECTION 3.
Write GEMINI_REVIEW_COMPLETE when done." --yolo
` ` `

Wait for `GEMINI_REVIEW_COMPLETE` marker in file.

### Step 4: Process SECTION 2

Parse Gemini's findings from YAML block.

Apply decision matrix:

` ` `yaml
CRITICAL + score >= 0.50: APPLY (update plan)
CRITICAL + score < 0.50: DISCUSS (return to user)
HIGH + score >= 0.50: APPLY
HIGH + score < 0.50: EVALUATE
MEDIUM + score >= 0.75: APPLY (auto-accept)
MEDIUM + score < 0.75: DEFER
LOW: DECLINE (unless trivial)
` ` `

### Step 5: Apply Changes

For each APPLY decision:

1. Read current plan file
2. Make minimal fix to address the issue
3. Preserve author's voice
4. Mark `applied: true`

### Step 6: Fill SECTION 3

Update review file with:

` ` `yaml
decisions:
  - finding_id: 1
    decision: APPLY
    applied: true
    note: "Added missing success criterion for edge case"

resolution:
  applied: 3
  deferred: 2
  declined: 1

rereview:
  needed: false  # true if critical unresolved

final:
  status: complete
  ready_for_dev: true
` ` `

### Step 7: Soft Block Check

After processing findings, if any CRITICAL or HIGH severity issues remain unresolved (`decision: DEFER` or `decision: DISCUSS`):

Use `AskUserQuestion` tool to prompt user:

` ` `yaml
question: "{N} CRITICAL/HIGH issue(s) remain unresolved in the plan. How would you like to proceed?"
header: "Plan Gate"
options:
  - label: "Proceed to development"
    description: "I accept the risks and will address issues during implementation"
  - label: "Stop and refine plan"
    description: "Don't start development until these are resolved"
  - label: "Open discussion"
    description: "Let's discuss the plan findings together"
multiSelect: false
` ` `

Wait for user response before continuing to Step 8.

If user selects "Stop and refine plan":
- Return early with `status: blocked_by_user`
- Include list of unresolved issues
- Mark `ready_for_dev: false`

### Step 8: Re-review (if needed)

If `rereview.needed: true` AND `iteration < 2`:

1. Create `plan-review-{instance_id}-v{iteration+1}.md`
2. Set `previous_review: plan-review-{instance_id}.md`
3. Fill SECTION 1.6 with previous context
4. Repeat from Step 3

### Step 9: Return

` ` `json
{
  "instance_id": "251231-1229-migrate-tables",
  "status": "complete",
  "iterations": 1,
  "review_type": "gemini",
  "completeness_assessment": {"pass": 5, "fail": 1},
  "findings": {"critical": 0, "high": 1, "medium": 2, "low": 1},
  "applied": 2,
  "deferred": 1,
  "declined": 1,
  "ready_for_dev": true,
  "review_file": "plans/251231-1229-migrate-tables/reviews/plan-review-251231-1229.md"
}
` ` `

---

## Fallback: Self-Review

If Gemini unavailable:

1. Fill SECTION 1 normally
2. Fill SECTION 2 yourself (Claude analysis)
3. Mark `review_type: self-review`
4. Process SECTION 3 normally
5. Note: Single-LLM perspective, less adversarial

---

## Quality Criteria Reference

### Plan Completeness

| Criterion | Check For |
|-----------|-----------|
| Phases | Clear boundaries, logical ordering |
| Scope | Each phase small enough for 1-4 hour session |
| Dependencies | No circular deps, external deps identified |
| Success Criteria | Measurable, testable conditions |
| Risks | Major risks identified with mitigations |
| Effort | Estimates realistic for scope |

### Architecture Alignment

- Follows CLAUDE.md patterns
- Uses MakerKit conventions
- Respects existing file structure
- Server/client separation correct
- Database patterns followed

### Task Quality

- Maps to success criteria (traceability)
- Atomic subtasks
- Testing tasks included
- No vague tasks ("handle edge cases")
- File paths specified where possible

---

## Constraints

` ` `xml
<rules>
  <max_iterations>2</max_iterations>
  <gemini_cannot>Rewrite plans</gemini_cannot>
  <gemini_cannot>Create new phases</gemini_cannot>
  <claude_cannot>Auto-apply CRITICAL without verification</claude_cannot>
  <loop_prevention>SECTION 3 never triggers auto-rereview</loop_prevention>
</rules>
` ` `

Model Strength Weighting

gemini_stronger (weight 0.7):
  - UI/visual consistency
  - Performance optimization
  - Large codebase patterns
  - Cross-file dependencies

claude_stronger (weight 0.7):
  - Architecture decisions
  - Edge case handling
  - Complex logic
  - Type system design

equal (weight 0.5):
  - Security
  - Testing

Dependencies

  • Gemini CLI - Optional. Falls back to Claude-only self-review if unavailable
  • Check with: where gemini (Windows) or which gemini (Unix)

Backwards Compatibility

  • Works without Gemini installed (graceful degradation to self-review)
  • No changes to existing Claude Code workflow
  • Agents are opt-in via subagent calls

Testing Checklist

  • Agent invokes Gemini with correct template path
  • Gemini fills SECTION 2 with proper YAML format
  • Claude processes findings with domain weighting
  • Soft block triggers on unresolved CRITICAL/HIGH
  • Fallback to self-review when Gemini unavailable

Environment

  • ClaudeKit version: latest
  • Claude Code version: latest
  • Gemini CLI: optional (falls back gracefully)
  • OS: Windows 11 / macOS / Linux

Labels: enhancement, agents, review, gemini

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions