Skip to content

krabat-l/claude-debug

Repository files navigation

claude-debug

Stop guessing. Start debugging.

When you ask your AI coding agent to fix a bug, it glances at the error and immediately starts editing code. It guesses wrong, edits again, guesses wrong again, and spirals for 30 minutes. You've seen this. It's the single biggest source of wasted time with AI coding agents.

claude-debug fixes this with one structural change: your agent cannot edit code until it has found the root cause.

This is enforced by hooks, not prompts. The agent physically cannot write to your files during the analysis phases. It must reproduce the bug, isolate the location with diagnostic evidence, and present its root cause analysis to you for confirmation — only then do the edit gates open.

The result: bugs that used to take 30 minutes of back-and-forth guessing now take 10-15 minutes of directed investigation. The fix is correct on the first try because it targets a confirmed root cause, not a guess.

Quick Start

Zero-install — copy one file and your agent follows the protocol in the next session:

mkdir -p .claude/rules
curl -o .claude/rules/debug.md https://raw.githubusercontent.com/krabat-l/claude-debug/master/rules/debug.md

Full plugin — adds phase gate hooks, specialized agents, and slash commands:

claude plugin add claude-debug

Also works with Cursor and Codex (adapters included).

How It Works

flowchart LR
    A["1. REPRODUCE\n🔒 edits blocked"] --> B["2. ISOLATE\n🔒 // DEBUG only"]
    B --> C["3. ROOT CAUSE\n🔒 user confirms"]
    C --> D["4. FIX\n🔓 edits allowed"]
    D --> E["5. VERIFY\n✅ tests pass"]
    E -->|"fail"| B
    style A fill:#fee,stroke:#c33
    style B fill:#fef,stroke:#93c
    style C fill:#ffe,stroke:#c93
    style D fill:#efe,stroke:#3c3
    style E fill:#eef,stroke:#33c
Loading

When you run /debug "test 15 times out", the agent enters a 5-phase protocol:

Phase 1: REPRODUCE — Run the failing test. See it fail. Capture the exact error. Don't read code, don't hypothesize, don't touch anything. Just observe.

Phase 2: ISOLATE — Now read the code. Add diagnostic logging (marked // DEBUG so the gate allows it). Re-run with diagnostics. Use binary search to halve the search space each iteration. Pinpoint the exact file, function, and line.

Phase 3: ROOT CAUSE — This is the critical phase. The agent must explain why the code fails — not where, but why. What assumption is violated? What state is unexpected? It presents its analysis and asks: "Do you agree, or should I investigate further?" The agent waits for your confirmation. This is the checkpoint that prevents wrong-direction fixes.

Phase 4: FIX — Edit gates open. Remove all // DEBUG lines. Apply the minimal code change that addresses the confirmed root cause. Nothing more.

Phase 5: VERIFY — Run the original failing test. Run related tests. For flaky bugs, run 5+ times. If verification fails, the root cause was wrong — back to Phase 2.

The Gate Mechanism

A PreToolUse hook intercepts every Edit and Write call:

flowchart TD
    E["Agent calls Edit/Write"] --> H{"Hook reads\n.claude/debug-session.json"}
    H -->|"phase: reproduce\nphase: root_cause"| D["❌ DENY\nsystemMessage: what to do instead"]
    H -->|"phase: isolate"| C{"new_string contains\n// DEBUG ?"}
    C -->|yes| A1["✅ ALLOW\n+ reminder to clean up later"]
    C -->|no| D
    H -->|"phase: fix\nphase: verify"| A2["✅ ALLOW"]
    H -->|"no session file"| A3["✅ ALLOW\n(not in debug mode)"]
    D --> E2["Agent reads denial message\nadjusts approach"]
    style D fill:#fee,stroke:#c33
    style A1 fill:#efe,stroke:#3c3
    style A2 fill:#efe,stroke:#3c3
    style A3 fill:#eef,stroke:#33c
Loading

The agent reads the denial message and adjusts its approach. No workaround, no override (unless the user explicitly runs /debug:skip).

Bug-Type Strategies

The agent classifies the bug and loads a specialized strategy:

Crash / Panic — Read the stack trace backward. Find the first frame in your code. Trace the bad value (the None, the out-of-bounds index, the dangling reference) to its source.

Wrong Output — Binary search through the data pipeline. Log the value at the midpoint. Correct there? Bug is downstream. Wrong? Upstream. Halve the space each iteration. Three steps to find the corruption in a 10-stage pipeline.

Intermittent / Flaky — Run 10 times, capture logs from a passing run AND a failing run. Diff the logs. The first point where event ordering diverges is the race condition. Identify the shared state, the missing synchronization, the ordering assumption.

Regressiongit bisect. Binary search through commit history. Find the exact commit that broke it. Read the diff. Understand the side effect. Fix it while preserving the original commit's intent. Use /debug:bisect for automated bisect sessions.

Performance / Timeout — Add timing instrumentation at stage boundaries. Find which stage takes 95% of the time. Drill into that stage. Is it a tight loop? A deadlock? An O(n^2) algorithm that used to work at small scale?

Test Failure — Trace the assertion backward. Expected X, got Y. Where does Y come from? Follow the value through the call chain. Is the code wrong or is the test wrong?

Each strategy is documented in detail in skills/debug-methodology/references/.

Specialized Agents

Three read-only agents handle investigation without risking accidental code changes:

  • Reproducer — Runs the test, captures output, checks if it fails consistently. Uses only Bash + Read.
  • Investigator — Traces code paths, finds callers, maps dependencies. Uses only Read + Grep + Glob.
  • Verifier — Runs tests after the fix, checks for regressions, validates reliability. Uses only Bash + Read.

They're read-only by design. An agent that can edit code during investigation might "fix" the bug before understanding it — which is exactly the problem we're solving.

Commands

Command What it does
/debug <description> Start a debug session with phase gates
/debug:status Show current phase and what to do next
/debug:skip Override the current phase gate
/debug:report Generate a debug report (for PRs, postmortems)
/debug:history View past debug sessions and their root causes
/debug:bisect <good-ref> <test-cmd> Automated git bisect for regression bugs
/debug:log add <file> / clean Add or remove // DEBUG diagnostic logging
/debug:end End the debug session and remove phase gates

What's Inside

claude-debug/
├── CLAUDE.md                              Project context for Claude Code contributors
├── rules/debug.md                         Zero-install: copy this one file
├── commands/
│   ├── debug.md                           /debug — start a session with phase gates
│   ├── debug-status.md                    /debug:status — show current phase
│   ├── debug-skip.md                      /debug:skip — override phase gate
│   ├── debug-report.md                    /debug:report — generate debug report
│   ├── debug-history.md                   /debug:history — view past sessions
│   ├── debug-bisect.md                    /debug:bisect — automated git bisect
│   ├── debug-log.md                       /debug:log — add/remove diagnostic logging
│   └── debug-end.md                       /debug:end — end session, remove gates
├── examples/
│   ├── rust-panic.md                      Walkthrough: Rust unwrap panic (15 min)
│   ├── python-flaky-test.md              Walkthrough: Python race condition (10+ runs)
│   ├── typescript-wrong-output.md        Walkthrough: API field mismatch (8 min)
│   └── go-regression.md                  Walkthrough: git bisect finds JSON tag rename
├── skills/
│   ├── debug-methodology/
│   │   ├── SKILL.md                       Core 5-phase protocol
│   │   └── references/                    7 bug-type strategies
│   ├── debug-isolate/SKILL.md             Isolation techniques
│   └── debug-root-cause/SKILL.md          Root cause analysis patterns
├── agents/
│   ├── reproducer.md                      Read-only: runs tests, captures output
│   ├── investigator.md                    Read-only: traces code paths
│   └── verifier.md                        Read-only: runs tests after fix
├── hooks/
│   ├── hooks.json                         Hook configuration (PreToolUse + Stop)
│   └── scripts/
│       ├── debug_gate.py                  Phase gate enforcement (tested)
│       └── debug_cleanup.py               Session cleanup on completion
├── examples/
│   ├── rust-panic.md                      Walkthrough: Rust unwrap panic
│   └── python-flaky-test.md              Walkthrough: intermittent test failure
├── tests/
│   └── test_debug_gate.py                Hook script tests (9 tests)
├── .claude-plugin/plugin.json             Claude Code adapter
├── .cursor-plugin/plugin.json             Cursor adapter
└── .codex/instructions.md                 Codex adapter

Comparison with Other Approaches

vs. CLAUDE.md rules alone

You can write "always reproduce before fixing" in your CLAUDE.md. The agent will try to follow it. But rules are suggestions — the agent can and will skip steps when it thinks it sees the answer. claude-debug makes this structurally impossible: the PreToolUse hook denies the edit call, and the agent gets a system message telling it what phase it's in. Rules say "please." Hooks say "no."

Use CLAUDE.md rules when: the bug is simple, you trust the agent to follow process, or you don't want to install a plugin.

Use claude-debug when: the bug is complex (flaky, race condition, multi-file), the agent has a history of guessing wrong, or you want a structured investigation record.

vs. Superpowers' systematic-debugging skill

Superpowers pioneered the idea of using hooks and skills to enforce workflow discipline in AI agents. Its systematic-debugging skill provides a solid protocol. claude-debug builds on this foundation with three additions:

  1. Mechanical enforcement: Superpowers documents the protocol in skill files. claude-debug enforces it with PreToolUse hooks that deny edits. The agent cannot bypass the protocol even if it wants to.
  2. Bug-type specialization: six detailed strategies (crash, wrong-output, intermittent, regression, performance, test-failure) with specific techniques for each, rather than a single general protocol.
  3. Dedicated commands: /debug:bisect for regressions, /debug:log for diagnostic logging management, /debug:report for postmortems — purpose-built tools for common debugging subtasks.

If you already use Superpowers, claude-debug can complement it — the hook-based gates add enforcement to Superpowers' protocol-based approach.

vs. manual debugging

The 5-phase protocol is how experienced developers debug. The difference is that experienced developers have internalized the discipline; AI agents have not. claude-debug externalizes the discipline as mechanical constraints. If you would tell a junior developer "stop guessing and actually run the test first," this plugin does the same thing for your agent.

FAQ

Does this work with any language? Yes. The protocol is language-agnostic. The diagnostic logging markers (// DEBUG, # DEBUG) cover C-style and Python-style comments. The bug-type strategies reference Rust examples but the techniques (binary search, stack trace analysis, git bisect) apply to any language.

What if I want to skip a phase? Run /debug:skip. The agent advances to the next phase immediately. The session records which phases were skipped so you can see if shortcuts correlated with fix quality. Use it when you already know the root cause and just need the agent to fix it.

Does it slow down simple bugs? Yes, slightly. For a typo or obvious one-liner, the full protocol is overkill. Use /debug:skip to fast-forward, or just don't start a debug session — fix it directly. The plugin only activates when you run /debug. Normal coding is unaffected.

What happens if the agent's root cause is wrong? Phase 5 (VERIFY) catches this: the test still fails. The session transitions back to Phase 2 (ISOLATE) with a retry counter. The agent must re-investigate with fresh diagnostic logging rather than piling patches on a wrong analysis.

Can I use the zero-install rules file with the full plugin? Yes, but it's redundant. The plugin includes the rules file content in its skills. If you install the full plugin, you don't need the separate rules/debug.md in your project.

How do I see what phase I'm in? Run /debug:status. It reads .claude/debug-session.json and shows the current phase with guidance on what to do next.

Does this work in headless/non-interactive mode? Partially. Phases 1, 2, 4, and 5 work fully in headless mode. Phase 3 (ROOT CAUSE) requires user confirmation by design — in headless mode, the agent will present its analysis and wait. You can pre-authorize skipping Phase 3 confirmation by including instructions in your prompt, but this removes the key safety checkpoint.

Where is session state stored? In .claude/debug-session.json, which is gitignored by default. Debug reports are saved to .claude/debug-reports/. Both are local to your project.

Philosophy

  • Observe before you hypothesize. Run the test. See it fail. Don't assume.
  • Measure before you conclude. Add logging. Get data. Don't guess.
  • Confirm before you fix. Present your analysis. Get a yes. Don't rush.
  • Verify before you ship. Run the tests. All of them. Don't hope.

Every phase produces evidence. Every transition requires that evidence. Every fix addresses a confirmed root cause. This is debugging as a systematic process, not a guessing game.

Inspired By

  • Superpowers — proved that hooks + skills can enforce workflow discipline in AI coding agents
  • Why Programs Fail — Andreas Zeller's scientific approach to debugging
  • The scientific method — observe, hypothesize, test, conclude

Contributing

See CONTRIBUTING.md. We especially welcome new bug-type strategies, multi-platform adaptations, and improvements to the phase gate logic.

License

MIT