Cross-model adversarial review for Claude Code.
Spawn a second LLM to challenge your plans, diagnoses, and code.
You write a plan or diagnose a bug in Claude Code. Instead of trusting yourself, you run /debate and a different model — ChatGPT, Gemini, or Claude Sonnet — reviews your work with read-only codebase access. Cross-model diversity is the point: different model families catch different blind spots. The reviewer gives a verdict: APPROVED or REVISE. If REVISE, the orchestrator fixes the issues and resubmits — up to 3 rounds.
You (Claude Opus) Reviewer (ChatGPT / Gemini / Sonnet)
│ orchestrator │ read-only challenger
│ │
├─ Write plan │
├─ /debate ─────────────────►│ Read codebase
│ │ Analyze plan
│◄──── VERDICT: REVISE ──────┤
├─ Fix issues │
├─ Resubmit ────────────────►│ Re-review
│◄──── VERDICT: APPROVED ────┤
└─ Continue with confidence
1. Copy into your project:
# Clone and copy
git clone https://github.com/AlessioZazzarini/debate-kit.git
cp -r debate-kit/ your-project/.claude/skills/debate/2. Describe your project (the only required step):
Open .claude/skills/debate/architecture-brief.md and fill in your tech stack, directory structure, and key patterns. A commented-out example is included at the bottom of the template.
3. Use it:
/debate # Review the plan you're writing
/debate debug # Challenge a bug diagnosis
/debate review # Code review recent git changes
/debate review src/lib/ # Code review a specific path
/debate --provider codex # Use ChatGPT / GPT-5.2
/debate --provider gemini # Use Gemini 3 Pro
/debate --provider claude # Use Claude Sonnet (same-family fallback)
| Requirement | Notes |
|---|---|
| Claude Code | The CLI tool — you're probably already using it |
ANTHROPIC_API_KEY |
Set in your environment |
| At least one reviewer CLI | See Providers below |
The skill operates in three modes, each tailored to a different stage of development:
| Mode | Command | What the Reviewer Does |
|---|---|---|
| Plan Review | /debate |
Challenges architecture decisions, finds security gaps, spots race conditions |
| Debug Review | /debate debug |
Proposes alternative root causes, identifies untested assumptions |
| Code Review | /debate review [path] |
Finds logic bugs, missing error handling, performance issues |
Plan and debug modes loop (up to 3 rounds) until the reviewer approves or the limit is hit. Code review is single-pass — findings are presented once.
For the full technical flow with diagrams, see docs/how-debate-works.md.
architecture-brief.md is the single file the reviewer reads to understand your project. The more specific you are, the better the reviews:
"We use Prisma middleware for tenant isolation on all queries" > "Standard patterns"
Drop specialized context files into domain-context/ for deeper reviews of specific subsystems. Each file declares when it should be included via YAML frontmatter:
---
name: "API Layer"
triggers:
paths: ["src/api/", "src/routes/"]
keywords: ["endpoint", "route", "middleware"]
---When a plan or review touches matching paths or keywords, the context is automatically included. Two examples are provided:
example-web-app.md— Full-stack TypeScript app (routes, auth, DB)example-data-pipeline.md— Python ML pipeline (DAGs, features, models)
Copy the closest example, rename it, and customize. Files starting with _ are templates and are excluded from matching.
The whole point of /debate is that a different model family reviews your work. Claude reviewing Claude catches fewer blind spots than ChatGPT or Gemini reviewing Claude.
| Provider | Model | Flag | Install | Why |
|---|---|---|---|---|
| Codex CLI (recommended) | GPT-5.2 | --provider codex |
npm install -g @openai/codex + OPENAI_API_KEY |
Different family, strong on logic and edge cases |
| Gemini CLI | Gemini 3 Pro | --provider gemini |
Install Gemini CLI + GEMINI_API_KEY |
Different family, strong on data flow and API design |
| Claude CLI (fallback) | Sonnet | --provider claude |
Already installed | Same family — less diverse, but zero setup |
Auto-detection: When no --provider flag is given, the skill checks which CLIs are installed and picks the first available in the order above (codex → gemini → claude).
| Mode | Max Rounds | Cost per Round | Max Cost |
|---|---|---|---|
| Plan review | 3 | ~$0.50 | ~$1.50 |
| Debug review | 3 | ~$0.50 | ~$1.50 |
| Code review | 1 | ~$0.50 | ~$0.50 |
Early exit on APPROVED — if round 1 passes, you pay for 1 round only.
.claude/skills/debate/ # ← Where it lives in your project
├── SKILL.md # Executable skill definition
├── architecture-brief.md # Your project description (edit this)
├── docs/
│ ├── how-debate-works.md # Technical deep-dive with diagrams
│ └── TRIAGE-2026-02-22.md # Case study: the skill reviewing itself
└── domain-context/
├── _template.md # Blank template for new domains
├── example-web-app.md # Reference: TypeScript web app
└── example-data-pipeline.md # Reference: Python ML pipeline
This skill was built for the Roger project and improved by its own review system. We ran /debate review on the original skill using GPT-5.2 as the reviewer. It found 11 issues — after triage, 4 were real correctness bugs (broken session resume, unreliable exit codes, ambiguous verdict parsing) and 4 were security theater inappropriate for a local dev tool.
The full triage is preserved in docs/TRIAGE-2026-02-22.md — a useful reference for evaluating AI code reviews critically.
| Problem | Solution |
|---|---|
| "No architecture-brief.md found" | Fill in the template — the skill works without it but reviews will be generic |
| "Reviewer process failed (exit N)" | Auth issue. Verify ANTHROPIC_API_KEY is set, or run codex login for Codex |
| "Reviewer produced no output" | Rate limit or expired token. Wait a minute, retry. For Codex: codex login |
| Reviews are too generic | Add more detail to your architecture brief. Add domain context files for key subsystems |