garrytan · soviero · Mar 12, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,38 +1,74 @@
-# gstack development
+# CLAUDE.md
+
+This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
+
+## What is gstack
+
+An AI engineering workflow toolkit that turns Claude Code into specialized skills. The core component is a persistent headless Chromium browser daemon accessed via compiled CLI binary. Additional skills (ship, review, plan, retro) are prompt-only SKILL.md files.
 
 ## Commands
 
 ```bash
-bun install          # install dependencies
-bun test             # run integration tests (browse + snapshot)
-bun run dev <cmd>    # run CLI in dev mode, e.g. bun run dev goto https://example.com
-bun run build        # compile binary to browse/dist/browse
+bun install              # install dependencies + Playwright Chromium
+bun test                 # run all integration tests (~3s)
+bun test browse/test/commands    # command integration tests only
+bun test browse/test/snapshot    # snapshot tests only
+bun test --match "*Navigation*"  # run tests matching a pattern
+bun run dev <cmd>        # run CLI from source (no compile step)
+bun run build            # compile binary to browse/dist/browse (~58MB)
+bun run server           # start server directly (for debugging)
 ```
 
-## Project structure
+## Architecture
+
+**Client-server split**: The browse tool is a thin CLI client (`cli.ts`) that sends HTTP POST requests to a persistent Bun HTTP server (`server.ts`). The server manages Chromium via Playwright.
 
 ```
-gstack/
-├── browse/          # Headless browser CLI (Playwright)
-│   ├── src/         # CLI + server + commands
-│   ├── test/        # Integration tests + fixtures
-│   └── dist/        # Compiled binary
-├── ship/            # Ship workflow skill
-├── review/          # PR review skill
-├── plan-ceo-review/ # /plan-ceo-review skill
-├── plan-eng-review/ # /plan-eng-review skill
-├── retro/           # Retrospective skill
-├── setup            # One-time setup: build binary + symlink skills
-├── SKILL.md         # Browse skill (Claude discovers this)
-└── package.json     # Build scripts for browse
+CLI (compiled binary) ──HTTP POST──► Bun server (localhost:9400-9410) ──► Playwright ──► Chromium
 ```
 
-## Deploying to the active skill
+- State file at `/tmp/browse-server.json` stores PID, port, and bearer token (UUID per session)
+- CLI auto-starts server on first call; server auto-shuts down after 30 min idle
+- Chromium crash causes server exit; CLI detects and auto-restarts on next call
+
+**Snapshot/ref system**: The key abstraction for web interaction. `snapshot.ts` parses Playwright's accessibility tree (`page.locator().ariaSnapshot()`), assigns `@e1`, `@e2`... refs to elements, and builds a `Map<string, Locator>`. Commands like `click @e3` resolve the ref to a Playwright Locator. Refs are invalidated on navigation.
+
+**Command organization**: Commands are split by mutation semantics:
+- `read-commands.ts` — non-mutating (text, html, links, js, css, forms, console, network, etc.)
+- `write-commands.ts` — mutating (goto, click, fill, select, scroll, viewport, etc.)
+- `meta-commands.ts` — server/tab management (status, stop, restart, tabs, screenshot, pdf, chain, diff)
+
+New commands are registered as routes in `server.ts`.
+
+**Buffers** (`buffers.ts`): Ring buffers (50k cap) capture console messages and network requests in memory, flushed to disk every 1s.
+
+## Skills
 
-The active skill lives at `~/.claude/skills/gstack/`. After making changes:
+Each skill directory contains a `SKILL.md` that Claude discovers. Skills other than browse are prompt-only (no code):
+- `ship/` — merge → test → review → version bump → commit → push → PR
+- `review/` — two-pass pre-landing review checklist
+- `bugfix/` — test-driven bug fixing: discover → reproduce → fix → verify → improve
+- `plan-ceo-review/` — founder-mode planning (expansion/hold/reduction scopes)
+- `plan-eng-review/` — engineering architecture review with diagrams
+- `retro/` — weekly retrospective from commit history
 
-1. Push your branch
-2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
-3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
+## Adding a new command
+
+1. Add handler in `read-commands.ts` (non-mutating) or `write-commands.ts` (mutating)
+2. Register route in `server.ts`
+3. Add test in `browse/test/commands.test.ts` with HTML fixture if needed
+4. `bun test` then `bun run build`
+
+## Testing
+
+Tests use Bun's native test runner. Integration tests spin up a local HTTP server (`browse/test/test-server.ts`) serving fixtures from `browse/test/fixtures/`, then exercise commands against real Playwright browser instances.
+
+## Deploying changes to the active skill
+
+The active skill lives at `~/.claude/skills/gstack/`. After changes:
+
+```bash
+cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && bun run build
+```
 
 Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**
 
-Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, and engineering retrospectives — all as slash commands.
+Seven opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, test-driven bug fixing, one-command shipping, browser automation, and engineering retrospectives — all as slash commands.
 
 ### Without gstack
 
@@ -20,6 +20,7 @@ Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/
 | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
 | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
 | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. |
+| `/bugfix` | Disciplined debugger | Reproduce the bug, prove the root cause, fix it with a failing test that goes RED then GREEN. Never guess. |
 | `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. |
 | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
 | `/retro` | Engineering manager | Analyze commit history, work patterns, and shipping velocity for the week. |
@@ -82,11 +83,11 @@ This is not a prompt pack for beginners. It is an operating system for people wh
 
 Open Claude Code and paste this. Claude will do the rest.
 
-> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
+> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
 
 ### Step 2: Add to your repo so teammates get it (optional)
 
-> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
+> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
 
 Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use).
 
@@ -261,6 +262,39 @@ I want the model imagining the production incident before it happens.
 
 ---
 
+## `/bugfix`
+
+This is my **disciplined debugger mode**.
+
+When something breaks, the model's instinct is to guess at the cause and start patching. Half the time the guess is wrong. Now you have two bugs instead of one.
+
+`/bugfix` enforces a strict workflow: **Reproduce → Prove → Fix → Verify → Improve.**
+
+The model cannot skip steps. It cannot fix a bug it has not reproduced. It cannot implement a fix without first proving its hypothesis about the root cause. And after the fix, it must answer the most important question: **why did the existing tests miss this?**
+
+That is the part most developers skip. That is the part that matters most.
+
+### Example
+
+Take the same listing app. `/review` flagged a race condition: two tabs can overwrite cover-photo selection.
+
+Without `/bugfix`, the model might add a lock and move on. Maybe it works. Maybe it does not. No proof either way.
+
+With `/bugfix`:
+
+1. Run existing tests — they all pass. No concurrency test exists for cover-photo selection.
+2. Write a test that simulates two concurrent updates to the same listing's cover photo — it fails. Race confirmed.
+3. Fix: atomic `WHERE old_photo = ? UPDATE SET new_photo` instead of read-then-write.
+4. Verify: concurrency test passes. All existing tests still pass.
+5. Improve: no test existed for any concurrent state mutation on listings. Add concurrency tests for title editing and price updates too.
+
+The output is a RED → GREEN proof that the fix works, a root cause analysis, and the test gap that let it through.
+
+I do not want the model guessing and patching.
+I want it proving and preventing.
+
+---
+
 ## `/ship`
 
 This is my **release machine mode**.
@@ -392,7 +426,7 @@ Run `cd ~/.claude/skills/gstack && ./setup` (or `cd .claude/skills/gstack && ./s
 Run `cd ~/.claude/skills/gstack && bun install && bun run build`. This compiles the browser binary. Requires Bun v1.0+.
 
 **Project copy is stale?**
-Re-copy from global: `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
+Re-copy from global: `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
 
 **`bun` not installed?**
 Install it: `curl -fsSL https://bun.sh/install | bash`
@@ -401,15 +435,15 @@ Install it: `curl -fsSL https://bun.sh/install | bash`
 
 Paste this into Claude Code:
 
-> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
+> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
 
 The `setup` script rebuilds the browser binary and re-symlinks skills. It takes a few seconds.
 
 ## Uninstalling
 
 Paste this into Claude Code:
 
-> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
+> Uninstall gstack: remove the skill symlinks by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
 
 ## Development
 

diff --git a/bugfix/SKILL.md b/bugfix/SKILL.md
@@ -0,0 +1,152 @@
+---
+name: bugfix
+version: 1.0.0
+description: |
+  Test-driven bug fixing. Reproduce the bug, validate the root cause, write a
+  failing test, fix it, verify, then close the test gap that let it through.
+allowed-tools:
+  - Bash
+  - Read
+  - Edit
+  - Write
+  - Grep
+  - Glob
+  - AskUserQuestion
+---
+
+# /bugfix — Test-Driven Bug Fixing
+
+You are running the `/bugfix` workflow. This is a structured, disciplined approach to fixing bugs. The goal is not just to fix the bug — it is to fix it provably and ensure the same class of bug cannot recur.
+
+**Core principle: Reproduce → Prove → Fix → Verify → Improve.**
+
+**Only stop for:**
+- Cannot reproduce the bug (ask the user for more information)
+- Hypothesis disproved twice with no clear alternative (ask the user for guidance)
+- No test runner or test infrastructure detected (ask the user how to run tests)
+
+**Never stop for:**
+- Messy or unfamiliar code (read it, understand it, proceed)
+- Large number of related tests to run (run them all)
+- The fix being small (small fixes still need reproduction and verification)
+
+---
+
+## Step 1: Understand the Bug
+
+Before touching any code:
+
+1. Identify the **expected behavior** vs **actual behavior**.
+2. Identify the file(s) and function(s) involved.
+3. Check git blame and recent commits on the affected files — was this a regression?
+
+```bash
+git log --oneline -20 -- <suspected-file>
+```
+
+If the bug report is vague, **STOP** and use AskUserQuestion to get exact reproduction steps, expected vs actual behavior, and any error messages.
+
+---
+
+## Step 2: Run Existing Tests
+
+**MANDATORY FIRST ACTION.** Before changing anything, run the tests related to the affected code.
+
+This tells you three things:
+- Whether tests exist for this code at all
+- Whether existing tests already catch the bug (they should fail)
+- If tests pass, there is a coverage gap for this scenario — note it for Step 8
+
+---
+
+## Step 3: Reproduce the Bug
+
+**You MUST reproduce the bug before making any code changes.**
+
+Try in order:
+1. Write a failing test that triggers the exact scenario
+2. Run the code and observe the failure directly
+3. Inspect state (logs, data, config) to confirm the conditions
+
+**If you cannot reproduce it, STOP.** Tell the user what you tried and use AskUserQuestion to request more information. Do NOT guess. Fixing a bug you cannot reproduce leads to wrong fixes.
+
+---
+
+## Step 4: Validate Your Hypothesis
+
+Form a hypothesis about the root cause, then **prove it before writing the fix.**
+
+The validation must produce evidence — not "I think this is the cause" but "I confirmed this is the cause because X." Add a log or assertion that confirms the bad state, write a targeted test, inspect the data directly, or trace the code path.
+
+If your hypothesis is **disproved**, form a new one and repeat. Do NOT proceed to implementation on a wrong hypothesis.
+
+---
+
+## Step 5: Write a Failing Test (RED)
+
+Write a test that reproduces the exact bug scenario:
+
+- The test MUST **fail** before your fix, proving it catches the bug
+- Name it descriptively so the bug scenario is documented in the test name
+- Use realistic inputs that mirror the actual failure
+
+Run the test and confirm it fails.
+
+---
+
+## Step 6: Implement the Fix (GREEN)
+
+Fix the bug. Minimal change only.
+
+- Do not refactor surrounding code.
+- Do not add features.
+- Do not "improve" unrelated things.
+
+---
+
+## Step 7: Verify
+
+Run the reproduction test — it MUST pass.
+
+Run all related tests — no regressions.
+
+If any previously-passing test now fails, you introduced a regression. Fix it before proceeding.
+
+---
+
+## Step 8: Close the Test Gap
+
+**Do not skip this step.** After the fix, answer: **why did existing tests not catch this?**
+
+Common gaps: missing scenario, weak assertion (checked "not null" but not the value), test data that did not trigger the boundary condition, over-mocking that hid the real behavior.
+
+Based on your analysis, add tests that prevent this **class** of bug — not just this instance. If the bug was a boundary issue, add boundary tests. If it was a missing edge case, add edge cases for the same function.
+
+---
+
+## Step 9: Summary
+
+Output a brief summary:
+
+```
+## Bug Fix
+
+**Bug**: [description]
+**Root cause**: [what was actually wrong]
+**Fix**: [what was changed]
+**Reproduction test**: [test name] — RED before fix, GREEN after
+**Regression check**: [suite] — all passing
+**Test gap**: [why tests missed it, what was added]
+**Files changed**: [list]
+```
+
+---
+
+## Important Rules
+
+- **Never fix a bug you cannot reproduce.** If you cannot trigger it, ask for help.
+- **Never implement a fix without validating your hypothesis.** Prove the root cause first.
+- **Never skip the test gap analysis.** Understanding WHY tests missed it is as valuable as the fix.
+- **Never finish without improving test coverage.** The same bug class should be caught next time.
+- **Never fix unrelated code.** Stay focused on the bug.
+- **Always show RED → GREEN proof.** The user should see the test fail before the fix and pass after.