From 94b64df593b821ec2b552b69291e7db8e0e03ddc Mon Sep 17 00:00:00 2001 From: rodri Date: Thu, 12 Mar 2026 13:11:15 -0300 Subject: [PATCH] When you present Claude Code with a bug, it tends to assume what the problem is and jump straight to implementation. Sometimes it will be wrong. It can take a lot of back and forth, and for each hypothesis Claude will make changes to the code that it won't undo after it eventually gets the solution right. /bugfix forces a disciplined workflow: reproduce the bug first, prove the root cause before writing any code, then fix it with a failing test as proof. No guessing, no leftover changes from wrong attempts. And after the fix, it closes the test gap that let the bug through in the first place. --- CLAUDE.md | 84 ++++++++++++++++++-------- README.md | 46 +++++++++++++-- bugfix/SKILL.md | 152 ++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 252 insertions(+), 30 deletions(-) create mode 100644 bugfix/SKILL.md diff --git a/CLAUDE.md b/CLAUDE.md index 0fb4879..9e88b05 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,38 +1,74 @@ -# gstack development +# CLAUDE.md + +This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository. + +## What is gstack + +An AI engineering workflow toolkit that turns Claude Code into specialized skills. The core component is a persistent headless Chromium browser daemon accessed via compiled CLI binary. Additional skills (ship, review, plan, retro) are prompt-only SKILL.md files. ## Commands ```bash -bun install # install dependencies -bun test # run integration tests (browse + snapshot) -bun run dev # run CLI in dev mode, e.g. bun run dev goto https://example.com -bun run build # compile binary to browse/dist/browse +bun install # install dependencies + Playwright Chromium +bun test # run all integration tests (~3s) +bun test browse/test/commands # command integration tests only +bun test browse/test/snapshot # snapshot tests only +bun test --match "*Navigation*" # run tests matching a pattern +bun run dev # run CLI from source (no compile step) +bun run build # compile binary to browse/dist/browse (~58MB) +bun run server # start server directly (for debugging) ``` -## Project structure +## Architecture + +**Client-server split**: The browse tool is a thin CLI client (`cli.ts`) that sends HTTP POST requests to a persistent Bun HTTP server (`server.ts`). The server manages Chromium via Playwright. ``` -gstack/ -├── browse/ # Headless browser CLI (Playwright) -│ ├── src/ # CLI + server + commands -│ ├── test/ # Integration tests + fixtures -│ └── dist/ # Compiled binary -├── ship/ # Ship workflow skill -├── review/ # PR review skill -├── plan-ceo-review/ # /plan-ceo-review skill -├── plan-eng-review/ # /plan-eng-review skill -├── retro/ # Retrospective skill -├── setup # One-time setup: build binary + symlink skills -├── SKILL.md # Browse skill (Claude discovers this) -└── package.json # Build scripts for browse +CLI (compiled binary) ──HTTP POST──► Bun server (localhost:9400-9410) ──► Playwright ──► Chromium ``` -## Deploying to the active skill +- State file at `/tmp/browse-server.json` stores PID, port, and bearer token (UUID per session) +- CLI auto-starts server on first call; server auto-shuts down after 30 min idle +- Chromium crash causes server exit; CLI detects and auto-restarts on next call + +**Snapshot/ref system**: The key abstraction for web interaction. `snapshot.ts` parses Playwright's accessibility tree (`page.locator().ariaSnapshot()`), assigns `@e1`, `@e2`... refs to elements, and builds a `Map`. Commands like `click @e3` resolve the ref to a Playwright Locator. Refs are invalidated on navigation. + +**Command organization**: Commands are split by mutation semantics: +- `read-commands.ts` — non-mutating (text, html, links, js, css, forms, console, network, etc.) +- `write-commands.ts` — mutating (goto, click, fill, select, scroll, viewport, etc.) +- `meta-commands.ts` — server/tab management (status, stop, restart, tabs, screenshot, pdf, chain, diff) + +New commands are registered as routes in `server.ts`. + +**Buffers** (`buffers.ts`): Ring buffers (50k cap) capture console messages and network requests in memory, flushed to disk every 1s. + +## Skills -The active skill lives at `~/.claude/skills/gstack/`. After making changes: +Each skill directory contains a `SKILL.md` that Claude discovers. Skills other than browse are prompt-only (no code): +- `ship/` — merge → test → review → version bump → commit → push → PR +- `review/` — two-pass pre-landing review checklist +- `bugfix/` — test-driven bug fixing: discover → reproduce → fix → verify → improve +- `plan-ceo-review/` — founder-mode planning (expansion/hold/reduction scopes) +- `plan-eng-review/` — engineering architecture review with diagrams +- `retro/` — weekly retrospective from commit history -1. Push your branch -2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main` -3. Rebuild: `cd ~/.claude/skills/gstack && bun run build` +## Adding a new command + +1. Add handler in `read-commands.ts` (non-mutating) or `write-commands.ts` (mutating) +2. Register route in `server.ts` +3. Add test in `browse/test/commands.test.ts` with HTML fixture if needed +4. `bun test` then `bun run build` + +## Testing + +Tests use Bun's native test runner. Integration tests spin up a local HTTP server (`browse/test/test-server.ts`) serving fixtures from `browse/test/fixtures/`, then exercise commands against real Playwright browser instances. + +## Deploying changes to the active skill + +The active skill lives at `~/.claude/skills/gstack/`. After changes: + +```bash +cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && bun run build +``` Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse` diff --git a/README.md b/README.md index f458eb5..79bb912 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,7 @@ **gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.** -Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, and engineering retrospectives — all as slash commands. +Seven opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, test-driven bug fixing, one-command shipping, browser automation, and engineering retrospectives — all as slash commands. ### Without gstack @@ -20,6 +20,7 @@ Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/ | `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. | | `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. | | `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. | +| `/bugfix` | Disciplined debugger | Reproduce the bug, prove the root cause, fix it with a failing test that goes RED then GREEN. Never guess. | | `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. | | `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. | | `/retro` | Engineering manager | Analyze commit history, work patterns, and shipping velocity for the week. | @@ -82,11 +83,11 @@ This is not a prompt pack for beginners. It is an operating system for people wh Open Claude Code and paste this. Claude will do the rest. -> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it. +> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it. ### Step 2: Add to your repo so teammates get it (optional) -> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. +> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills. Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use). @@ -261,6 +262,39 @@ I want the model imagining the production incident before it happens. --- +## `/bugfix` + +This is my **disciplined debugger mode**. + +When something breaks, the model's instinct is to guess at the cause and start patching. Half the time the guess is wrong. Now you have two bugs instead of one. + +`/bugfix` enforces a strict workflow: **Reproduce → Prove → Fix → Verify → Improve.** + +The model cannot skip steps. It cannot fix a bug it has not reproduced. It cannot implement a fix without first proving its hypothesis about the root cause. And after the fix, it must answer the most important question: **why did the existing tests miss this?** + +That is the part most developers skip. That is the part that matters most. + +### Example + +Take the same listing app. `/review` flagged a race condition: two tabs can overwrite cover-photo selection. + +Without `/bugfix`, the model might add a lock and move on. Maybe it works. Maybe it does not. No proof either way. + +With `/bugfix`: + +1. Run existing tests — they all pass. No concurrency test exists for cover-photo selection. +2. Write a test that simulates two concurrent updates to the same listing's cover photo — it fails. Race confirmed. +3. Fix: atomic `WHERE old_photo = ? UPDATE SET new_photo` instead of read-then-write. +4. Verify: concurrency test passes. All existing tests still pass. +5. Improve: no test existed for any concurrent state mutation on listings. Add concurrency tests for title editing and price updates too. + +The output is a RED → GREEN proof that the fix works, a root cause analysis, and the test gap that let it through. + +I do not want the model guessing and patching. +I want it proving and preventing. + +--- + ## `/ship` This is my **release machine mode**. @@ -392,7 +426,7 @@ Run `cd ~/.claude/skills/gstack && ./setup` (or `cd .claude/skills/gstack && ./s Run `cd ~/.claude/skills/gstack && bun install && bun run build`. This compiles the browser binary. Requires Bun v1.0+. **Project copy is stale?** -Re-copy from global: `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` +Re-copy from global: `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` **`bun` not installed?** Install it: `curl -fsSL https://bun.sh/install | bash` @@ -401,7 +435,7 @@ Install it: `curl -fsSL https://bun.sh/install | bash` Paste this into Claude Code: -> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` +> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` The `setup` script rebuilds the browser binary and re-symlinks skills. It takes a few seconds. @@ -409,7 +443,7 @@ The `setup` script rebuilds the browser binary and re-symlinks skills. It takes Paste this into Claude Code: -> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. +> Uninstall gstack: remove the skill symlinks by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too. ## Development diff --git a/bugfix/SKILL.md b/bugfix/SKILL.md new file mode 100644 index 0000000..612934a --- /dev/null +++ b/bugfix/SKILL.md @@ -0,0 +1,152 @@ +--- +name: bugfix +version: 1.0.0 +description: | + Test-driven bug fixing. Reproduce the bug, validate the root cause, write a + failing test, fix it, verify, then close the test gap that let it through. +allowed-tools: + - Bash + - Read + - Edit + - Write + - Grep + - Glob + - AskUserQuestion +--- + +# /bugfix — Test-Driven Bug Fixing + +You are running the `/bugfix` workflow. This is a structured, disciplined approach to fixing bugs. The goal is not just to fix the bug — it is to fix it provably and ensure the same class of bug cannot recur. + +**Core principle: Reproduce → Prove → Fix → Verify → Improve.** + +**Only stop for:** +- Cannot reproduce the bug (ask the user for more information) +- Hypothesis disproved twice with no clear alternative (ask the user for guidance) +- No test runner or test infrastructure detected (ask the user how to run tests) + +**Never stop for:** +- Messy or unfamiliar code (read it, understand it, proceed) +- Large number of related tests to run (run them all) +- The fix being small (small fixes still need reproduction and verification) + +--- + +## Step 1: Understand the Bug + +Before touching any code: + +1. Identify the **expected behavior** vs **actual behavior**. +2. Identify the file(s) and function(s) involved. +3. Check git blame and recent commits on the affected files — was this a regression? + +```bash +git log --oneline -20 -- +``` + +If the bug report is vague, **STOP** and use AskUserQuestion to get exact reproduction steps, expected vs actual behavior, and any error messages. + +--- + +## Step 2: Run Existing Tests + +**MANDATORY FIRST ACTION.** Before changing anything, run the tests related to the affected code. + +This tells you three things: +- Whether tests exist for this code at all +- Whether existing tests already catch the bug (they should fail) +- If tests pass, there is a coverage gap for this scenario — note it for Step 8 + +--- + +## Step 3: Reproduce the Bug + +**You MUST reproduce the bug before making any code changes.** + +Try in order: +1. Write a failing test that triggers the exact scenario +2. Run the code and observe the failure directly +3. Inspect state (logs, data, config) to confirm the conditions + +**If you cannot reproduce it, STOP.** Tell the user what you tried and use AskUserQuestion to request more information. Do NOT guess. Fixing a bug you cannot reproduce leads to wrong fixes. + +--- + +## Step 4: Validate Your Hypothesis + +Form a hypothesis about the root cause, then **prove it before writing the fix.** + +The validation must produce evidence — not "I think this is the cause" but "I confirmed this is the cause because X." Add a log or assertion that confirms the bad state, write a targeted test, inspect the data directly, or trace the code path. + +If your hypothesis is **disproved**, form a new one and repeat. Do NOT proceed to implementation on a wrong hypothesis. + +--- + +## Step 5: Write a Failing Test (RED) + +Write a test that reproduces the exact bug scenario: + +- The test MUST **fail** before your fix, proving it catches the bug +- Name it descriptively so the bug scenario is documented in the test name +- Use realistic inputs that mirror the actual failure + +Run the test and confirm it fails. + +--- + +## Step 6: Implement the Fix (GREEN) + +Fix the bug. Minimal change only. + +- Do not refactor surrounding code. +- Do not add features. +- Do not "improve" unrelated things. + +--- + +## Step 7: Verify + +Run the reproduction test — it MUST pass. + +Run all related tests — no regressions. + +If any previously-passing test now fails, you introduced a regression. Fix it before proceeding. + +--- + +## Step 8: Close the Test Gap + +**Do not skip this step.** After the fix, answer: **why did existing tests not catch this?** + +Common gaps: missing scenario, weak assertion (checked "not null" but not the value), test data that did not trigger the boundary condition, over-mocking that hid the real behavior. + +Based on your analysis, add tests that prevent this **class** of bug — not just this instance. If the bug was a boundary issue, add boundary tests. If it was a missing edge case, add edge cases for the same function. + +--- + +## Step 9: Summary + +Output a brief summary: + +``` +## Bug Fix + +**Bug**: [description] +**Root cause**: [what was actually wrong] +**Fix**: [what was changed] +**Reproduction test**: [test name] — RED before fix, GREEN after +**Regression check**: [suite] — all passing +**Test gap**: [why tests missed it, what was added] +**Files changed**: [list] +``` + +--- + +## Important Rules + +- **Never fix a bug you cannot reproduce.** If you cannot trigger it, ask for help. +- **Never implement a fix without validating your hypothesis.** Prove the root cause first. +- **Never skip the test gap analysis.** Understanding WHY tests missed it is as valuable as the fix. +- **Never finish without improving test coverage.** The same bug class should be caught next time. +- **Never fix unrelated code.** Stay focused on the bug. +- **Always show RED → GREEN proof.** The user should see the test fail before the fix and pass after.