Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 14 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,20 @@ resolve against the wrong DOM. If the DOM changed and you run
`click @e3`, Playwright fails with a clear "no element" error — fix by
re-snapshotting.

`ghax batch` skips that re-snapshotting ceremony for you: when a step
inside a batch plan references an `@e<n>` ref, the daemon auto-runs a
fresh snapshot first and resolves the ref against the current DOM.
That's the main reason batch exists — on framework-heavy forms where
an earlier step (like opening a combobox) reshuffles the ARIA tree,
the ref lookup inside the same batch still lands on the intended
element. Opt out with `--no-auto-snapshot`.

`ghax snapshot` is also **dialog-aware**: when a modal is open (by
`[role=dialog]`, `[role=alertdialog]`, `<dialog open>`, or
`[aria-modal=true]`), the walker treats the top-most visible modal
as the new root instead of inheriting `aria-hidden="true"` from the
outer app. `--no-dialog-scope` falls back to body-rooted.

Shadow DOM: the cursor-interactive pass walks open shadow roots and
emits Playwright chain selectors (`host >> inner`). This is the only
form of selector Playwright accepts for descending into shadow trees
Expand Down
23 changes: 21 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,6 +138,25 @@ and type commands at the prompt. `exit`/`quit`/Ctrl-D to leave. Blank
lines and `#` lines are ignored. Quoting works like a real shell:
`try --css 'body { color: red }'` passes the whole CSS intact.

### Run a plan in one round-trip (stable refs across the sequence)

```bash
ghax batch '[
{"cmd":"goto","args":["https://app.example.com/settings"]},
{"cmd":"snapshot","opts":{"interactive":true}},
{"cmd":"click","args":["@e7"]},
{"cmd":"wait","args":["200"]},
{"cmd":"fill","args":["@e9","new-value"]},
{"cmd":"click","args":["@e11"]}
]'
```

Unlike `chain` (reads stdin, N round-trips), `batch` ships the whole
plan in one RPC. Between steps that reference `@e<n>` refs, the
daemon auto-re-snapshots so opening a combobox mid-plan doesn't
reindex refs out from under you. Pass `--no-auto-snapshot` for
strict one-shot semantics.

### Share the browser with a user who's actively working

```bash
Expand Down Expand Up @@ -172,7 +191,7 @@ Every change must pass:
cargo build --release # compile Rust CLI (crates/cli/)
npm run typecheck # tsc --noEmit (daemon TS + tests)
npm run build # bundle daemon → dist/ghax-daemon.mjs (esbuild)
npm run test:smoke # 70-check smoke suite against a live Edge session
npm run test:smoke # 95-check smoke suite against a live Edge session
```

For bigger changes also run:
Expand Down Expand Up @@ -212,7 +231,7 @@ design discussion:
another machine" case.
- **Skill acceptance eval harness** — scripted Claude API calls
against the skills with tool-call assertions. Deferred indefinitely
because the 70-check E2E smoke catches the same regressions at zero
because the 95-check E2E smoke catches the same regressions at zero
API cost.
- ~~Source-map resolution for stack frames.~~ Shipped — opt-in via
`ghax console --source-maps`.
Expand Down
32 changes: 24 additions & 8 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,24 @@ moving parts so you can land one without friction.
```
ghax/
bin/ghax Shell shim — launches the Rust binary from target/release/ghax
crates/cli/ Rust CLI — argv parsing, dispatch, daemon RPC. All user-facing verbs.
src/main.rs Entry point + verb dispatch table
src/dispatch.rs Per-verb routing to daemon RPC or local orchestration
src/attach.rs Daemon spawn, CDP probe, port scan, bundle resolution
src/qa.rs QA orchestrator (parallel URL crawl, screenshots, report)
src/canary.rs Post-deploy canary monitor
src/ship.rs Ship workflow (bump, changelog, commit, push, PR)
src/rpc.rs HTTP+JSON client with transient-error retry
src/state.rs State file resolution + daemon liveness
src/shell.rs Interactive REPL (ghax shell)
src/
cli.ts Argv → daemon RPC. Verb dispatcher + attach/detach specials.
daemon.ts Node HTTP daemon. Playwright connectOverCDP + raw CDP pool.
browser-launch.ts Browser detect + CDP probe + scan/findFreePort + --launch/--headless.
cdp-client.ts /json/list target discovery + per-target WebSocket pool.
config.ts State file resolution (git root → .ghax/ghax.json).
buffers.ts CircularBuffer<T>, ConsoleEntry, NetworkEntry, parseStack().
snapshot.ts aria tree → @e<n> refs, cursor-interactive + shadow-DOM pass.
snapshot.ts aria tree → @e<n> refs, cursor-interactive + shadow-DOM + dialog-scope.
test/
smoke.ts Live-browser harness (70 checks, ~30s).
smoke.ts Live-browser harness (95 checks, ~30s).
cross-browser.ts Iterate every detected Chromium browser; run smoke on each.
benchmark.ts Headless CLI benchmark vs gstack-browse, playwright-cli, agent-browser.
hot-reload-smoke.ts Scripted MV3 hot-reload probe against test/fixtures/test-extension/.
Expand Down Expand Up @@ -84,9 +92,17 @@ Both scripts share the same install path. Idempotent — safe to re-run.
## Adding a new command

1. Register a handler in `daemon.ts` via `register('name', async (ctx, args, opts) => {...})`.
2. Add a CLI case in `src/cli.ts` — usually one line with `makeSimple('name')`.
3. Update the HELP constant + `README.md` + `design/plan/03-commands.md`.
4. If it should be recorded by `ghax record`, do nothing (it's recorded
2. Wire the Rust dispatch in `crates/cli/src/dispatch.rs`. For trivial
verbs (parse args → POST /rpc → print), add the verb name to one
of the existing `match` arms — `simple()` does the rest. For verbs
with CLI-side logic (custom print, multi-RPC, shell-out), add a
new module under `crates/cli/src/<verb>.rs` exposing
`pub fn cmd_<verb>(parsed: &Parsed) -> Result<i32>`, then wire it
in `dispatch.rs::dispatch_inner` and declare `mod <verb>;` in
`main.rs`. See `qa.rs`, `ship.rs`, `attach.rs` for templates.
3. Update `crates/cli/src/help.rs` + `README.md` + `design/plan/03-commands.md`.
4. Add a smoke check in `test/smoke.ts`.
5. If it should be recorded by `ghax record`, do nothing (it's recorded
by default). If it's meta / read-only, add the name to `NEVER_RECORD`
in `daemon.ts`.
5. If it has a custom exit code, throw `new DaemonError(msg, code)` and
Expand Down Expand Up @@ -126,7 +142,7 @@ npm run test:perf # perf budget test — FAILS if P50 regresses past t
```

The smoke test requires a running Chromium-family browser on
`--remote-debugging-port=9222`. It attaches, runs **70 non-destructive
`--remote-debugging-port=9222`. It attaches, runs **95 non-destructive
commands** (navigation, snapshots, interaction, extensions, orchestrated
verbs, `try`, `perf`, console dedup, network status/HAR, new-window
workflow, `shell` mode tokenising), and detaches. Takes ~30s end-to-end.
Expand Down
20 changes: 15 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ instead of spinning up sandboxed copies.
**Status**: v0.4 complete. Flagship `ghax browse` plus an orchestrated
layer (`qa`, `perf`, `profile`, `diff-state`, `ship`, `canary`,
`review`, `pair`, `try`) and a background-window workflow
(`find`, `new-window`, `tab --quiet`) for multi-agent use. 70/70 smoke
(`find`, `new-window`, `tab --quiet`) for multi-agent use. 95 smoke
checks on Edge + Chrome. Repo is private under `kepptic` for now;
open-source release paused.

Expand Down Expand Up @@ -57,10 +57,20 @@ Attach to a running Chrome or Edge over CDP, then drive it:
message instead of a raw Playwright stack trace.
- **Responsive testing**: `ghax responsive` snaps mobile / tablet / desktop
widths; `ghax viewport WxH` for one-offs.
- **Batch + record + render**: pipe JSON to `ghax chain` for scripted flows;
`ghax record start / stop` captures every command into a replayable
`.ghax/recordings/<name>.json`; `ghax gif <recording>` stitches the
frames via ffmpeg.
- **Batch + record + render**: `ghax batch '[{"cmd":"click","args":["@e7"]}, …]'`
ships a whole plan in one round-trip and auto-re-snapshots between
ref-using steps (so clicking a combobox that reshuffles the ARIA tree
doesn't wreck later refs); `ghax chain` reads the same shape from
stdin for ad-hoc flows; `ghax record start / stop` captures every
command into a replayable `.ghax/recordings/<name>.json`; `ghax gif
<recording>` stitches the frames via ffmpeg.
- **Dialog-aware snapshots**: when a modal is open (`[role=dialog]`,
`<dialog open>`, `[aria-modal=true]`), `ghax snapshot` walks the
dialog instead of the body. Fall back with `--no-dialog-scope`.
- **Framework-safe `fill`**: native-setter + `input` for React,
explicit `blur` for Angular validators, and `contenteditable` paths
for Material chip inputs and rich editors — so `fill @e5 "hello"`
actually updates state across every framework you'd hit in the wild.

## Install

Expand Down
69 changes: 10 additions & 59 deletions TODOS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,66 +10,9 @@ belong here — either flesh it out or close it.

## Open

### Rewrite the CLI in Rust (public-release gate)

**What:** Replace `src/cli.ts` (~2,071 lines) with a Rust crate that
produces platform-specific binaries via `cargo-dist`. Daemon stays
Node/Playwright. Full design + phasing in
[`design/plan/06-rust-cli-rewrite.md`](./design/plan/06-rust-cli-rewrite.md).

**Why:** Distribution. The Bun-compiled CLI is 61MB because it embeds
the Bun runtime. A stripped Rust binary is ~10MB per platform. This
is the last concrete friction between current ghax and a public
release we'd be satisfied shipping.

Secondary wins: ~2-5ms cold start (vs 37ms Bun), no runtime
dependency, standard `cargo install` / `brew install` distribution,
no per-platform Bun builds needed (one `cargo build --release --target
<triple>` per OS × arch).

**Pros:**
- 6x smaller binary per platform (~10MB vs 61MB)
- 7-15x faster cold start for single-command invocations
- Standard Rust cross-compile toolchain via cargo-dist handles
macOS/Linux/Windows × x64/ARM in one CI workflow
- Opens clean install paths: Homebrew tap, `cargo install ghax`, npm
wrapper, direct GitHub Release download
- Rust binary is a more inviting open-source artifact than a 60MB blob

**Cons:**
- 3-4 days active dev time (per the phasing plan)
- Dual-language repo during the rewrite window (mitigated by a parity
diff test in CI)
- Contributor pool shifts slightly — JS/TS folks contributing to CLI
vs Rust folks. Daemon stays TS so JS contributors still have turf.
- Node remains a runtime dependency (for daemon) — we can't eliminate
it without replacing Playwright, which is out of scope.

**Context:**
- Decision recorded 2026-04-19 after a perf deep-dive showed the
stack is already at its physical floor for single-command
invocations (~30ms, dominated by Bun CLI spawn).
- The design doc covers architecture, dependency choices, per-verb
porting plan, distribution story, phasing (4 phases), risks, and
success criteria (8 green checks gate the switch).
- Phase 1 is template work: 45 trivial verbs that are pure RPC +
print. Fast.
- Phase 2 is the real work: attach, qa, canary, ship, review — 8
verbs with CLI-side orchestration logic.
- Phase 3 is SSE + REPL (console/network --follow, ghax shell).
- Phase 4 flips `bin/ghax` to prefer the Rust binary.

**Depends on / blocked by:** Nothing. The Rust CLI and Bun CLI can
coexist during the rewrite. Dual-maintenance window lasts ~1-2 weeks.

**Effort:** ~3-4 days active, spread over 2-3 weeks calendar.

**Success criteria:** All 8 gates in `06-rust-cli-rewrite.md` green
(binary sizes, smoke parity, perf floor, parity diff, Homebrew
install, docs, cargo-dist release workflow).

### Split `src/daemon.ts` by domain


**What:** Extract handler groups into domain-specific files. Approved in
plan-eng-review on 2026-04-19.

Expand Down Expand Up @@ -127,4 +70,12 @@ smoke re-verification.

## Completed

(Items move here from "Open" once they ship, with commit reference.)
- **Rewrite the CLI in Rust (public-release gate)** — shipped across
phases 1-4. `src/cli.ts` deleted in `b2748e7` (refactor: remove the
Bun CLI source — Rust is the single source of truth). `bin/ghax`
shim now prefers `target/release/ghax`; installed users run the
Rust binary directly. Bun runtime fully removed in `8d1deb5`;
esbuild bundles the daemon, tsx runs the tests. All 8 success
gates green: ~2.6 MB stripped Apple Silicon binary (under the 10MB
target), 70/70 smoke parity, cold-start floor hit, cross-browser
green on Edge + Chrome, install-link/install-release flows live.
Loading