Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
84 changes: 60 additions & 24 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -1,38 +1,74 @@
# gstack development
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## What is gstack

An AI engineering workflow toolkit that turns Claude Code into specialized skills. The core component is a persistent headless Chromium browser daemon accessed via compiled CLI binary. Additional skills (ship, review, plan, retro) are prompt-only SKILL.md files.

## Commands

```bash
bun install # install dependencies
bun test # run integration tests (browse + snapshot)
bun run dev <cmd> # run CLI in dev mode, e.g. bun run dev goto https://example.com
bun run build # compile binary to browse/dist/browse
bun install # install dependencies + Playwright Chromium
bun test # run all integration tests (~3s)
bun test browse/test/commands # command integration tests only
bun test browse/test/snapshot # snapshot tests only
bun test --match "*Navigation*" # run tests matching a pattern
bun run dev <cmd> # run CLI from source (no compile step)
bun run build # compile binary to browse/dist/browse (~58MB)
bun run server # start server directly (for debugging)
```

## Project structure
## Architecture

**Client-server split**: The browse tool is a thin CLI client (`cli.ts`) that sends HTTP POST requests to a persistent Bun HTTP server (`server.ts`). The server manages Chromium via Playwright.

```
gstack/
├── browse/ # Headless browser CLI (Playwright)
│ ├── src/ # CLI + server + commands
│ ├── test/ # Integration tests + fixtures
│ └── dist/ # Compiled binary
├── ship/ # Ship workflow skill
├── review/ # PR review skill
├── plan-ceo-review/ # /plan-ceo-review skill
├── plan-eng-review/ # /plan-eng-review skill
├── retro/ # Retrospective skill
├── setup # One-time setup: build binary + symlink skills
├── SKILL.md # Browse skill (Claude discovers this)
└── package.json # Build scripts for browse
CLI (compiled binary) ──HTTP POST──► Bun server (localhost:9400-9410) ──► Playwright ──► Chromium
```

## Deploying to the active skill
- State file at `/tmp/browse-server.json` stores PID, port, and bearer token (UUID per session)
- CLI auto-starts server on first call; server auto-shuts down after 30 min idle
- Chromium crash causes server exit; CLI detects and auto-restarts on next call

**Snapshot/ref system**: The key abstraction for web interaction. `snapshot.ts` parses Playwright's accessibility tree (`page.locator().ariaSnapshot()`), assigns `@e1`, `@e2`... refs to elements, and builds a `Map<string, Locator>`. Commands like `click @e3` resolve the ref to a Playwright Locator. Refs are invalidated on navigation.

**Command organization**: Commands are split by mutation semantics:
- `read-commands.ts` — non-mutating (text, html, links, js, css, forms, console, network, etc.)
- `write-commands.ts` — mutating (goto, click, fill, select, scroll, viewport, etc.)
- `meta-commands.ts` — server/tab management (status, stop, restart, tabs, screenshot, pdf, chain, diff)

New commands are registered as routes in `server.ts`.

**Buffers** (`buffers.ts`): Ring buffers (50k cap) capture console messages and network requests in memory, flushed to disk every 1s.

## Skills

The active skill lives at `~/.claude/skills/gstack/`. After making changes:
Each skill directory contains a `SKILL.md` that Claude discovers. Skills other than browse are prompt-only (no code):
- `ship/` — merge → test → review → version bump → commit → push → PR
- `review/` — two-pass pre-landing review checklist
- `bugfix/` — test-driven bug fixing: discover → reproduce → fix → verify → improve
- `plan-ceo-review/` — founder-mode planning (expansion/hold/reduction scopes)
- `plan-eng-review/` — engineering architecture review with diagrams
- `retro/` — weekly retrospective from commit history

1. Push your branch
2. Fetch and reset in the skill directory: `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main`
3. Rebuild: `cd ~/.claude/skills/gstack && bun run build`
## Adding a new command

1. Add handler in `read-commands.ts` (non-mutating) or `write-commands.ts` (mutating)
2. Register route in `server.ts`
3. Add test in `browse/test/commands.test.ts` with HTML fixture if needed
4. `bun test` then `bun run build`

## Testing

Tests use Bun's native test runner. Integration tests spin up a local HTTP server (`browse/test/test-server.ts`) serving fixtures from `browse/test/fixtures/`, then exercise commands against real Playwright browser instances.

## Deploying changes to the active skill

The active skill lives at `~/.claude/skills/gstack/`. After changes:

```bash
cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && bun run build
```

Or copy the binary directly: `cp browse/dist/browse ~/.claude/skills/gstack/browse/dist/browse`
46 changes: 40 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

**gstack turns Claude Code from one generic assistant into a team of specialists you can summon on demand.**

Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, one-command shipping, browser automation, and engineering retrospectives — all as slash commands.
Seven opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/docs/claude-code). Plan review, code review, test-driven bug fixing, one-command shipping, browser automation, and engineering retrospectives — all as slash commands.

### Without gstack

Expand All @@ -20,6 +20,7 @@ Six opinionated workflow skills for [Claude Code](https://docs.anthropic.com/en/
| `/plan-ceo-review` | Founder / CEO | Rethink the problem. Find the 10-star product hiding inside the request. |
| `/plan-eng-review` | Eng manager / tech lead | Lock in architecture, data flow, diagrams, edge cases, and tests. |
| `/review` | Paranoid staff engineer | Find the bugs that pass CI but blow up in production. Not a style nitpick pass. |
| `/bugfix` | Disciplined debugger | Reproduce the bug, prove the root cause, fix it with a failing test that goes RED then GREEN. Never guess. |
| `/ship` | Release engineer | Sync main, run tests, push, open PR. For a ready branch, not for deciding what to build. |
| `/browse` | QA engineer | Give the agent eyes. It logs in, clicks through your app, takes screenshots, catches breakage. Full QA pass in 60 seconds. |
| `/retro` | Engineering manager | Analyze commit history, work patterns, and shipping velocity for the week. |
Expand Down Expand Up @@ -82,11 +83,11 @@ This is not a prompt pack for beginners. It is an operating system for people wh

Open Claude Code and paste this. Claude will do the rest.

> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.
> Install gstack: run `git clone https://github.com/garrytan/gstack.git ~/.claude/skills/gstack && cd ~/.claude/skills/gstack && ./setup` then add a "gstack" section to CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, and lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro. Then ask the user if they also want to add gstack to the current project so teammates get it.

### Step 2: Add to your repo so teammates get it (optional)

> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.
> Add gstack to this project: run `cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup` then add a "gstack" section to this project's CLAUDE.md that says to use the /browse skill from gstack for all web browsing, never use mcp\_\_claude-in-chrome\_\_\* tools, lists the available skills: /plan-ceo-review, /plan-eng-review, /review, /bugfix, /ship, /browse, /retro, and tells Claude that if gstack skills aren't working, run `cd .claude/skills/gstack && ./setup` to build the binary and register skills.

Real files get committed to your repo (not a submodule), so `git clone` just works. The binary and node\_modules are gitignored — teammates just need to run `cd .claude/skills/gstack && ./setup` once to build (or `/browse` handles it automatically on first use).

Expand Down Expand Up @@ -261,6 +262,39 @@ I want the model imagining the production incident before it happens.

---

## `/bugfix`

This is my **disciplined debugger mode**.

When something breaks, the model's instinct is to guess at the cause and start patching. Half the time the guess is wrong. Now you have two bugs instead of one.

`/bugfix` enforces a strict workflow: **Reproduce → Prove → Fix → Verify → Improve.**

The model cannot skip steps. It cannot fix a bug it has not reproduced. It cannot implement a fix without first proving its hypothesis about the root cause. And after the fix, it must answer the most important question: **why did the existing tests miss this?**

That is the part most developers skip. That is the part that matters most.

### Example

Take the same listing app. `/review` flagged a race condition: two tabs can overwrite cover-photo selection.

Without `/bugfix`, the model might add a lock and move on. Maybe it works. Maybe it does not. No proof either way.

With `/bugfix`:

1. Run existing tests — they all pass. No concurrency test exists for cover-photo selection.
2. Write a test that simulates two concurrent updates to the same listing's cover photo — it fails. Race confirmed.
3. Fix: atomic `WHERE old_photo = ? UPDATE SET new_photo` instead of read-then-write.
4. Verify: concurrency test passes. All existing tests still pass.
5. Improve: no test existed for any concurrent state mutation on listings. Add concurrency tests for title editing and price updates too.

The output is a RED → GREEN proof that the fix works, a root cause analysis, and the test gap that let it through.

I do not want the model guessing and patching.
I want it proving and preventing.

---

## `/ship`

This is my **release machine mode**.
Expand Down Expand Up @@ -392,7 +426,7 @@ Run `cd ~/.claude/skills/gstack && ./setup` (or `cd .claude/skills/gstack && ./s
Run `cd ~/.claude/skills/gstack && bun install && bun run build`. This compiles the browser binary. Requires Bun v1.0+.

**Project copy is stale?**
Re-copy from global: `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
Re-copy from global: `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`

**`bun` not installed?**
Install it: `curl -fsSL https://bun.sh/install | bash`
Expand All @@ -401,15 +435,15 @@ Install it: `curl -fsSL https://bun.sh/install | bash`

Paste this into Claude Code:

> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`
> Update gstack: run `cd ~/.claude/skills/gstack && git fetch origin && git reset --hard origin/main && ./setup`. If this project also has gstack at .claude/skills/gstack, update it too: run `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack && cp -Rf ~/.claude/skills/gstack .claude/skills/gstack && rm -rf .claude/skills/gstack/.git && cd .claude/skills/gstack && ./setup`

The `setup` script rebuilds the browser binary and re-symlinks skills. It takes a few seconds.

## Uninstalling

Paste this into Claude Code:

> Uninstall gstack: remove the skill symlinks by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.
> Uninstall gstack: remove the skill symlinks by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f ~/.claude/skills/$s; done` then run `rm -rf ~/.claude/skills/gstack` and remove the gstack section from CLAUDE.md. If this project also has gstack at .claude/skills/gstack, remove it by running `for s in browse bugfix plan-ceo-review plan-eng-review review ship retro; do rm -f .claude/skills/$s; done && rm -rf .claude/skills/gstack` and remove the gstack section from the project CLAUDE.md too.

## Development

Expand Down
152 changes: 152 additions & 0 deletions bugfix/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
name: bugfix
version: 1.0.0
description: |
Test-driven bug fixing. Reproduce the bug, validate the root cause, write a
failing test, fix it, verify, then close the test gap that let it through.
allowed-tools:
- Bash
- Read
- Edit
- Write
- Grep
- Glob
- AskUserQuestion
---

# /bugfix — Test-Driven Bug Fixing

You are running the `/bugfix` workflow. This is a structured, disciplined approach to fixing bugs. The goal is not just to fix the bug — it is to fix it provably and ensure the same class of bug cannot recur.

**Core principle: Reproduce → Prove → Fix → Verify → Improve.**

**Only stop for:**
- Cannot reproduce the bug (ask the user for more information)
- Hypothesis disproved twice with no clear alternative (ask the user for guidance)
- No test runner or test infrastructure detected (ask the user how to run tests)

**Never stop for:**
- Messy or unfamiliar code (read it, understand it, proceed)
- Large number of related tests to run (run them all)
- The fix being small (small fixes still need reproduction and verification)

---

## Step 1: Understand the Bug

Before touching any code:

1. Identify the **expected behavior** vs **actual behavior**.
2. Identify the file(s) and function(s) involved.
3. Check git blame and recent commits on the affected files — was this a regression?

```bash
git log --oneline -20 -- <suspected-file>
```

If the bug report is vague, **STOP** and use AskUserQuestion to get exact reproduction steps, expected vs actual behavior, and any error messages.

---

## Step 2: Run Existing Tests

**MANDATORY FIRST ACTION.** Before changing anything, run the tests related to the affected code.

This tells you three things:
- Whether tests exist for this code at all
- Whether existing tests already catch the bug (they should fail)
- If tests pass, there is a coverage gap for this scenario — note it for Step 8

---

## Step 3: Reproduce the Bug

**You MUST reproduce the bug before making any code changes.**

Try in order:
1. Write a failing test that triggers the exact scenario
2. Run the code and observe the failure directly
3. Inspect state (logs, data, config) to confirm the conditions

**If you cannot reproduce it, STOP.** Tell the user what you tried and use AskUserQuestion to request more information. Do NOT guess. Fixing a bug you cannot reproduce leads to wrong fixes.

---

## Step 4: Validate Your Hypothesis

Form a hypothesis about the root cause, then **prove it before writing the fix.**

The validation must produce evidence — not "I think this is the cause" but "I confirmed this is the cause because X." Add a log or assertion that confirms the bad state, write a targeted test, inspect the data directly, or trace the code path.

If your hypothesis is **disproved**, form a new one and repeat. Do NOT proceed to implementation on a wrong hypothesis.

---

## Step 5: Write a Failing Test (RED)

Write a test that reproduces the exact bug scenario:

- The test MUST **fail** before your fix, proving it catches the bug
- Name it descriptively so the bug scenario is documented in the test name
- Use realistic inputs that mirror the actual failure

Run the test and confirm it fails.

---

## Step 6: Implement the Fix (GREEN)

Fix the bug. Minimal change only.

- Do not refactor surrounding code.
- Do not add features.
- Do not "improve" unrelated things.

---

## Step 7: Verify

Run the reproduction test — it MUST pass.

Run all related tests — no regressions.

If any previously-passing test now fails, you introduced a regression. Fix it before proceeding.

---

## Step 8: Close the Test Gap

**Do not skip this step.** After the fix, answer: **why did existing tests not catch this?**

Common gaps: missing scenario, weak assertion (checked "not null" but not the value), test data that did not trigger the boundary condition, over-mocking that hid the real behavior.

Based on your analysis, add tests that prevent this **class** of bug — not just this instance. If the bug was a boundary issue, add boundary tests. If it was a missing edge case, add edge cases for the same function.

---

## Step 9: Summary

Output a brief summary:

```
## Bug Fix

**Bug**: [description]
**Root cause**: [what was actually wrong]
**Fix**: [what was changed]
**Reproduction test**: [test name] — RED before fix, GREEN after
**Regression check**: [suite] — all passing
**Test gap**: [why tests missed it, what was added]
**Files changed**: [list]
```

---

## Important Rules

- **Never fix a bug you cannot reproduce.** If you cannot trigger it, ask for help.
- **Never implement a fix without validating your hypothesis.** Prove the root cause first.
- **Never skip the test gap analysis.** Understanding WHY tests missed it is as valuable as the fix.
- **Never finish without improving test coverage.** The same bug class should be caught next time.
- **Never fix unrelated code.** Stay focused on the bug.
- **Always show RED → GREEN proof.** The user should see the test fail before the fix and pass after.