diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..774c029 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,170 @@ +# AGENTS.md + +This is the generic agent memory file for the ghax repo. Any AI coding agent working on this codebase (Codex, Cursor, Aider, Continue, Windsurf, or Claude Code) should read this before editing anything. + +Claude Code users: [CLAUDE.md](./CLAUDE.md) extends this file with Claude-specific workflow notes; both apply. + +## What this project is + +ghax is a CLI that attaches to the user's real running Chrome or Edge over Chrome DevTools Protocol. It drives tabs, takes accessibility-tree snapshots with `@e` refs, works with MV3 extension internals, and captures console/network traffic. The CLI is Rust (small, fast binary); the daemon is Node (because Chromium automation needs a Node runtime). + +If you just arrived and need to install: see [llms.txt](./llms.txt) for the install + verify sequence. + +## Hard invariants (violating any of these has broken the tool) + +1. **CLI is Rust. Daemon is Node.** The CLI calls the daemon via HTTP RPC on 127.0.0.1. Do not add browser-automation calls to the Rust side. Do not add daemon bundling to the Rust side. The split is load-bearing. + +2. **Single daemon per state file.** `.ghax/ghax.json` at the git root stores `{pid, port, browserKind, browserUrl, cwd}`. Never spawn a second daemon pointing at the same state file. For parallel agents, use `GHAX_STATE_FILE=/tmp/ghax-.json`. + +3. **Refs survive only until the next snapshot, only on the tab they were taken on.** `ghax click @e3` looks up `@e3` against the daemon's last snapshot ref map. If the DOM changed, re-snapshot first. The `tab` and `new-window` handlers clear the ref map when the active page changes. + +4. **Daemon restart required after editing `src/daemon.ts`.** The daemon bundle is loaded once at attach time. Changes to `src/daemon.ts` don't take effect until `ghax detach && npm run build && ghax attach`. + +5. **The Rust CLI and daemon do not share source.** Rust uses `serde_json::Value` for daemon responses. When the daemon changes an RPC return shape, update whatever Rust dispatch code reads `data.get("foo")` if the field name changed. The smoke suite catches breakage. + +## Build and test + +```bash +# Daily dev loop +npm install # Node deps (playwright + source-map) +npm run build # bundles daemon → dist/ghax-daemon.mjs (esbuild, ~50 ms) +npm run build:rust # compiles Rust CLI → target/release/ghax +npm run build:all # both of the above +npm run typecheck # tsc --noEmit + +# Install to ~/.local/bin (idempotent) +npm run install-link + +# Smoke suite (requires Edge or Chrome running with --remote-debugging-port=9222) +npm run test:smoke # 95 checks, ~30 s, drives a real browser +npm run test:cross-browser # runs smoke against every detected Chromium family browser +npm run test:perf # enforces P50 latency budgets on critical ops +npm run test:benchmark # compare latency vs other CLIs (reference only, not a gate) +``` + +To run the smoke suite against a specific binary: `GHAX_BIN=$PWD/target/release/ghax npm run test:smoke`. + +## Project layout + +``` +ghax/ +├── bin/ghax Shim that execs target/release/ghax +├── crates/cli/ Rust CLI source (single crate, workspace root at repo root) +│ ├── Cargo.toml +│ └── src/ +│ ├── main.rs Entry point + verb dispatch +│ ├── dispatch.rs Per-verb routing to daemon RPC or local orchestration +│ ├── args.rs Argv → Parsed struct (mirrors TS parseArgs exactly) +│ ├── rpc.rs HTTP+JSON client with transient-error retry +│ ├── state.rs State file resolution + daemon liveness +│ ├── attach.rs Daemon spawn, CDP probe, port scan, bundle resolution +│ ├── shell.rs Interactive REPL (ghax shell) +│ ├── help.rs --help text (single source of truth) +│ ├── small.rs Medium verbs: status, chain, batch, replay, gif, pair, diff-state +│ ├── qa.rs QA orchestrator (parallel URL crawl) +│ ├── canary.rs Post-deploy canary monitor +│ ├── ship.rs Ship workflow (bump, changelog, commit, push, PR) +│ ├── review.rs Diff-aware code review emitter +│ ├── sse.rs Server-Sent Events client (live tail) +│ ├── output.rs JSON pretty-print +│ ├── qa_common.rs Shared filters between qa.rs and canary.rs +│ └── time_util.rs ISO-8601 / days-to-ymd (shared by qa/canary/ship) +├── src/ Node daemon source +│ ├── daemon.ts Main HTTP server + all RPC handlers (~2,500 lines, 72 verbs) +│ ├── snapshot.ts ARIA tree walker, cursor-interactive pass, shadow-DOM + dialog-aware +│ ├── cdp-client.ts Raw CDP WebSocket pool for SW/popup/option/sidepanel targets +│ ├── buffers.ts CircularBuffer, ConsoleEntry, NetworkEntry, parseStack +│ ├── config.ts State file resolution +│ └── source-maps.ts Opt-in source-map resolver for --source-maps +├── test/ +│ ├── smoke.ts 95 live-browser checks (the main safety net) +│ ├── cross-browser.ts Iterate every detected Chromium and smoke each +│ ├── benchmark.ts Latency comparison vs other CLIs +│ ├── perf-bench.ts P50 budget enforcer +│ ├── capture-bodies-smoke.ts Body-capture end-to-end test +│ └── hot-reload-smoke.ts MV3 hot-reload test against fixtures/test-extension/ +├── .claude/skills/ Claude Code skills (ghax + ghax-browse) +├── docs/ +│ ├── BENCHMARK.md Perf numbers +│ ├── design/ Design history (why the current architecture exists) +│ └── sessions/ Field reports from production agent runs +├── scripts/ +│ ├── install-link.sh Symlink into ~/.local/bin + bootstrap daemon node_modules +│ ├── install-release.sh Download latest GitHub release + install +│ ├── release.sh Cut a release (refuses to run with dirty tree) +│ └── bootstrap-daemon-runtime.sh Shared npm install step +├── ARCHITECTURE.md CLI/daemon split, CDP model, ref resolution +├── CHANGELOG.md Per-version changes +├── CLAUDE.md Claude Code specific notes (auto-discovered) +├── CONTRIBUTING.md Full contributor guide +├── LICENSE MIT +├── SECURITY.md Threat model +├── llms.txt Discovery + install guide for AI agents +└── README.md Human-facing overview +``` + +## Adding a new CLI verb + +Three steps; the smoke suite catches most mistakes. + +1. **Register the handler in `src/daemon.ts`**: + + ```ts + register('myVerb', async (ctx, args, opts) => { + const page = await activePage(ctx); + // ... + return { ...result }; + }); + ``` + +2. **Wire the Rust dispatch in `crates/cli/src/dispatch.rs`**. For trivial verbs (parse args → POST /rpc → print), add the verb name to an existing `match` arm and let `simple()` handle it. For verbs with CLI-side logic (custom print, multi-RPC, shell-out), add a new module under `crates/cli/src/.rs` exposing `pub fn cmd_(parsed: &Parsed) -> Result`, then wire it in `dispatch_inner` and declare `mod ;` in `main.rs`. See `qa.rs`, `ship.rs`, `attach.rs` as templates. + +3. **Add a smoke check in `test/smoke.ts`**. Shape: + + ```ts + c('my-verb does the expected thing', async () => { + const r = await run(['my-verb', 'arg', '--json']); + const data = parseJson<...>(r.stdout); + assert(...); + }); + ``` + +4. **Update `crates/cli/src/help.rs`** — the --help output is byte-authoritative for what we claim the tool does. + +Rebuild: `npm run build:all`. Restart the daemon: `ghax detach && ghax attach`. Run the smoke check: `GHAX_BIN=$PWD/target/release/ghax npm run test:smoke`. + +## Code style + +- **Rust**: rustfmt (run `cargo fmt` before committing). Prefer `anyhow::Result` in CLI modules, `thiserror` for domain-specific errors. No `unsafe` except the POSIX `kill` shim in `state.rs`. +- **TypeScript**: strict mode (see `tsconfig.json`). No `any` unless crossing an external-library boundary. Handlers return plain JSON-serializable objects — no class instances that serialize oddly. +- **Bash scripts**: `set -euo pipefail` at the top. Use `[ -f "$path" ]` test forms. Quote everything. +- **Comments**: explain *why*, not *what*. If a function does something subtle (an invariant, a workaround, a perf reason), leave a short note. Don't narrate what `if (x.length === 0)` already tells the reader. +- **Errors from the daemon**: throw `new DaemonError(message, exitCode)` where exitCode is a documented code (0, 2, 4). Plain `throw new Error(...)` maps to exit code 4. + +## Voice and writing + +When touching user-facing text (README, CHANGELOG, --help output, error messages), match the repo's existing voice: direct, concrete, builder-to-builder. Short sentences. Name specifics (real file paths, real numbers, real scenarios). No AI vocabulary (no "delve", "robust", "comprehensive", "leverage", "pivotal"). No em dashes. Avoid corporate tone. + +Error messages should name the problem and the fix: +- Bad: `Error: connection failed` +- Good: `daemon at :52321 not responding to /health — run 'ghax attach'` + +## Confusion protocol + +If you hit ambiguity with meaningful blast radius (two plausible architectures, a destructive operation with unclear scope, a request that contradicts existing patterns), stop and ask. Do not guess at architectural decisions. Routine coding, small features, and obvious changes don't need permission. + +## Commits and PRs + +- **Imperative subject, 70 char limit.** `feat(ext): reload re-injects content scripts` not `Fixed the extension reload thing`. +- **Reference the area in parens**: `feat(daemon)`, `fix(rust-cli)`, `docs(readme)`, `refactor(snapshot)`. +- **Body explains motivation** — the *why*, not a restatement of the subject. Link field-report or issue reference if applicable. +- **Don't claim `Co-Authored-By`** unless a human co-authored the change. +- **No force-push to `main`.** Feature branches are fine to rebase. + +Before opening a PR: typecheck clean, daemon bundle builds, Rust builds release, smoke passes, CHANGELOG updated under `## [Unreleased]`. Details in [CONTRIBUTING.md](./CONTRIBUTING.md). + +## Secrets + +The daemon binds to `127.0.0.1` only — no auth token. This is correct for single-user localhost (see [SECURITY.md](./SECURITY.md) for the rationale). Don't add features that expose the daemon over the network without a full security review. + +`chrome.storage.local` and cookie capture often contain auth tokens. Treat output from `ghax ext storage`, `ghax cookies`, and capture-bodies like `localStorage.getItem` — do not echo into commit messages, logs, or chat context. diff --git a/CLAUDE.md b/CLAUDE.md index 9ae3d80..ff48a2c 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -1,9 +1,13 @@ # ghax — project instructions for Claude Code -This file tells Claude (and other agents) how to work on this repo without -friction. Human contributors should read [CONTRIBUTING.md](./CONTRIBUTING.md) -instead — it has the same info plus the bits that don't matter to agents -(code of conduct, PR style, issue reporting). +Claude Code auto-loads this file when it opens the ghax repo. Other AI +agents (Codex, Cursor, Aider, Continue) should read [AGENTS.md](./AGENTS.md) +first — it covers the same ground in a vendor-neutral form. This file +extends that with Claude-specific workflow examples. + +Human contributors: start with [CONTRIBUTING.md](./CONTRIBUTING.md) — +it has the same info plus bits that don't matter to agents (code of +conduct, PR style, issue reporting). ## What this repo is diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 9293f75..0261762 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -1,9 +1,15 @@ # Contributing to ghax Thanks for your interest. ghax is pre-v1; expect rough edges and -opinionated pushback. That said, PRs are welcome — this doc lists the +opinionated pushback. That said, PRs are welcome. This doc lists the moving parts so you can land one without friction. +**Working with an AI coding agent?** Point it at [llms.txt](./llms.txt) +(install/usage) and [AGENTS.md](./AGENTS.md) (project memory). Those +two files contain everything an agent needs to work on this repo; this +document covers the same ground plus human-oriented bits (PR style, +code of conduct, issue reporting). + ## Repo layout ``` diff --git a/README.md b/README.md index d1cdac3..5fe3e9f 100644 --- a/README.md +++ b/README.md @@ -1,231 +1,198 @@ # ghax -Drive your **real** running Chrome or Edge from the command line. Not a -sandboxed copy. Your actual browser, with your actual auth, your actual -extensions, and your actual open tabs. +**Drive your real Chrome or Edge from the command line.** Not a sandboxed copy. Your actual browser, with your actual auth, your actual extensions, and your actual open tabs. ```bash ghax attach ghax goto https://app.example.com -ghax snapshot -i # aria tree with @e1, @e2, ... refs +ghax snapshot -i # aria tree with @e1, @e2, … refs ghax click @e3 ghax fill @e5 "hello" ``` -That's it. No separate browser to install. No fresh Chromium. No -"please log in again." The browser you already have open is the -browser you drive. +That's it. No separate browser to install. No fresh Chromium. No *please log in again*. Or use a scratch profile if that's what you want. Your call. -Prefer a scratch browser? That's one flag away — `ghax attach --launch ---headless` spawns a fresh Chromium in its own profile for CI-style -runs without touching your daily driver. +[Install →](#install) · [Quickstart →](#quickstart) · [Use with AI agents →](#install-with-an-ai-agent) · [Full commands →](#command-reference) · [License](#license) + +--- ## Why this exists -Every AI coding agent and every browser-automation script out there has -the same problem: they launch their own browser. Which means they don't -have your SSO session, don't have your Chrome extensions, don't know -which tabs you're already working in, and will happily trigger -Cloudflare bot protection on every SaaS dashboard worth QAing. +Every AI coding agent and every browser-automation script has the same problem: they launch their own browser. So they don't have your SSO session, don't have your Chrome extensions, don't know which tabs you're already working in, and will happily trip Cloudflare bot protection on any SaaS dashboard. -`ghax` attaches over CDP. One command. Real browser. Real state. +ghax attaches over CDP. One command. Real browser. Real state. -Or a scratch profile if that's what you want. Your call. +--- -## What it does +## Install with an AI agent -- **Accessibility-tree snapshots** with `@e` refs. Interact by role - and name, not fragile CSS selectors. Walks open shadow roots for - custom-element apps (Lit, Shoelace, web components) and emits - chain selectors (`host >> inner`) that descend into shadow trees. -- **Dialog-aware**. When a modal is open, snapshots walk the modal, not - the `aria-hidden="true"` app behind it. Saves you from empty trees - on Radix / Headless UI / Material dialogs. -- **MV3 extension internals**. List extensions, reload them, eval JS in - service workers, read/write `chrome.storage.*`, interact with side - panels, popups, options pages. Hot-reload on rebuild so `pnpm build` - gives you new code in 5 seconds without losing tab state. -- **Real user gestures** via CDP `Input.dispatch*`. Because - `chrome.sidePanel.open()` and friends refuse synthetic clicks. -- **Console + network capture** from the moment you attach. Rolling 5k - buffers, `--errors` and `--pattern` filters, request + response - headers, HAR 1.2 export, stack-frame parsing, dedup grouping, and - **source-map resolution** (`main.abc123.js:1:48291` → - `src/AuthForm.tsx:42:12`). -- **Core Web Vitals** (`ghax perf`). LCP (with the element that hit - it), FCP, CLS, TTFB, full nav timing. Buffered observers catch - entries that fired before you asked. -- **Live fix-preview** (`ghax try`). Inject CSS or JS against the - running page, measure the result, screenshot it, all in one call. - Revert = reload. -- **Framework-safe `fill`**. Native-setter + `input` for React, - explicit `blur` for Angular validators, `contenteditable` paths for - Material chip inputs and rich editors. Works on every framework you - actually hit. -- **Batch execution**. `ghax batch '[{"cmd":"click","args":["@e7"]}, - ...]'` ships a whole plan in one round-trip and auto-re-snapshots - between steps that use refs, so a mid-plan combobox reshuffle - doesn't break the rest of your sequence. -- **Background-window workflow**. `new-window`, `find`, `tab --quiet` - give an agent its own window in your browser without stealing focus - from the window you're working in. Multi-agent isolation via - `GHAX_STATE_FILE`. - -Full command reference in [ARCHITECTURE.md](./ARCHITECTURE.md). +Got Claude Code, Cursor, Codex, Aider, Continue, ChatGPT, or any other AI coding agent? Paste this into the chat: -## Install +> **Clone `https://github.com/kepptic/ghax` and follow the install steps in its `llms.txt` file. Verify with `ghax --version` and report success.** -ghax ships as a platform-specific Rust binary. Under 3 MB stripped on -Apple Silicon. Distribution is GitHub Releases only. No registries, -no taps, no accounts. +The agent reads [llms.txt](./llms.txt), runs three build commands, verifies the install works, and tells you when it's done. Total time: under a minute on modern hardware. -```bash -# macOS / Linux -curl --proto '=https' --tlsv1.2 -LsSf \ - https://github.com/kepptic/ghax/releases/latest/download/ghax-installer.sh | sh +If you're running Claude Code, the repo ships with two skills under [.claude/skills/](./.claude/skills/) that light up automatically when Claude opens this directory. See [Use with AI coding agents](#use-with-ai-coding-agents) below for how to surface them in every session. -# Windows PowerShell -irm https://github.com/kepptic/ghax/releases/latest/download/ghax-installer.ps1 | iex -``` +--- -Runtime needs **Node 20+** for the daemon. Most developer laptops -already have it. +## Install -Build from source (Rust 1.80+, Node 20+): +Prerequisites: **Node 20+**, **Rust 1.80+**, git. ```bash git clone https://github.com/kepptic/ghax.git cd ghax npm install -npm run build:all -npm run install-link # symlinks → ~/.local/bin/ghax +npm run build:all # compiles the Rust CLI + bundles the Node daemon +npm run install-link # symlinks target/release/ghax → ~/.local/bin/ghax ``` +Ensure `~/.local/bin` is on `PATH`. Then verify: + +```bash +ghax --version # → ghax 0.4.2 +ghax --help # prints the full command surface (71 verbs) +``` + +To uninstall: `npm run uninstall-link`. + +**Pre-built release archives** (macOS, Linux, Windows) are published on [GitHub Releases](https://github.com/kepptic/ghax/releases) when CI is green. Install the latest with `npm run install-release`. + +--- + ## Quickstart -1. Launch your browser with CDP enabled: +### 1. Launch your browser with CDP enabled - ```bash - # macOS Edge - "/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" \ - --remote-debugging-port=9222 & +```bash +# macOS Edge +"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" \ + --remote-debugging-port=9222 & + +# macOS Chrome v113+ — also needs an explicit profile path +"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + --remote-debugging-port=9222 \ + --user-data-dir="$HOME/.config/chrome-ghax" & +``` - # macOS Chrome (v113+ also needs an explicit profile path — see - # CONTRIBUTING.md "Known browser quirks") - "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ - --remote-debugging-port=9222 \ - --user-data-dir="$HOME/.config/chrome-ghax" & - ``` +Linux and Windows launch commands are in [CONTRIBUTING.md](./CONTRIBUTING.md). -2. Attach. Ghax scans ports 9222–9230 and picks the running browser: +### 2. Attach - ```bash - ghax attach - ``` +```bash +ghax attach # scans :9222–:9230, picks the running browser +``` -3. Drive it: +### 3. Drive it + +```bash +ghax tabs # list open tabs +ghax goto https://example.com +ghax snapshot -i # interactive @e refs +ghax click @e3 +ghax fill @e5 "hello" +ghax screenshot --path /tmp/shot.png +ghax perf # Core Web Vitals +``` - ```bash - ghax tabs # list open tabs - ghax goto https://example.com - ghax snapshot -i # get @e refs - ghax click @e3 - ghax fill @e5 "hello" - ghax screenshot --path /tmp/shot.png - ghax perf # Core Web Vitals - ``` +### 4. Detach -4. Detach when done: +```bash +ghax detach +``` - ```bash - ghax detach - ``` +--- ## Which profile? -You pick. Three modes: +You pick. Three modes, one flag each. + +| Mode | Command | When | +|------|---------|------| +| **Your real profile** | Launch your browser yourself with `--remote-debugging-port=9222`, then `ghax attach` | Default. Keeps your SSO, extensions, and open tabs. | +| **A dedicated ghax profile** | Same as above, but add `--user-data-dir=` to the browser launch | You want to keep ghax traffic separate from your daily driver. | +| **A throwaway scratch profile** | `ghax attach --launch` (add `--headless` for no window) | CI-style runs, reproducible environments, or you just don't want to launch the browser yourself. | + +--- + +## What it does + +- **Accessibility-tree snapshots** with `@e` refs. Interact by role and name, not fragile CSS selectors. Walks open shadow roots for custom-element apps (Lit, Shoelace, web components) and emits chain selectors (`host >> inner`) that descend into shadow trees. +- **Dialog-aware snapshots.** When a modal is open, `ghax snapshot` walks the modal instead of the `aria-hidden="true"` app behind it. Saves you from empty trees on Radix, Headless UI, and Material dialogs. +- **MV3 extension internals.** List extensions, reload them, eval JS in service workers, read/write `chrome.storage.*`, interact with side panels, popups, and options pages. **Hot-reload** on rebuild: `pnpm build` → new code running in 5 seconds without losing tab state. +- **Real user gestures** via CDP `Input.dispatch*`. Needed for APIs like `chrome.sidePanel.open()` that refuse synthetic clicks. +- **Console + network capture** from the moment you attach. Rolling 5k buffers, `--errors` and `--pattern` filters, request + response headers, HAR 1.2 export, stack-frame parsing, dedup grouping, and source-map resolution (`main.abc123.js:1:48291` → `src/AuthForm.tsx:42:12`). +- **Core Web Vitals** (`ghax perf`). LCP with the element that caused it, FCP, CLS, TTFB, full nav timing. Buffered observers catch entries that fired before you asked. +- **Live fix-preview** (`ghax try`). Inject CSS or JS against the running page, measure the result, screenshot it, all in one call. Revert = reload. +- **Framework-safe `fill`.** Native-setter + `input` for React, explicit `blur` for Angular validators, `contenteditable` paths for Material chip inputs and rich editors. +- **Batch execution.** `ghax batch '[{"cmd":"click","args":["@e7"]}, …]'` ships a whole plan in one round-trip and auto-re-snapshots between steps that use refs, so a mid-plan combobox reshuffle doesn't break the rest of your sequence. +- **Background-window workflow.** `new-window`, `find`, `tab --quiet` give an agent its own window in your browser without stealing focus. -- **Your real profile** — launch Edge or Chrome with - `--remote-debugging-port=9222` (the quickstart above). You keep your - extensions, your SSO cookies, and your open tabs. Ghax just drives - what's already there. -- **A dedicated ghax profile** — pass `--user-data-dir=` to your - browser launch to keep ghax's tabs separate from your daily driver. - Same browser binary, different profile directory. Useful if you - don't want an agent touching your personal tabs. -- **A fresh scratch profile** — `ghax attach --launch` spawns your - browser with a throwaway profile under `~/.ghax/-profile/`. - Add `--headless` for a no-window CI-style run. Zero overlap with - your daily driver. +--- -The quickstart covers option 1. Options 2 and 3 are one flag each. +## Command reference + +ghax ships 71 verbs. The full surface lives in `ghax --help` — no man pages, `--help` is authoritative. + +```bash +ghax --help # full command surface +ghax --help | less # scroll it +ghax --help # some verbs have per-verb help +``` + +Add `--json` to any command for machine-readable output. + +**Exit codes:** `0` ok · `1` usage error · `2` not attached · `4` CDP error · `10` build/bootstrap failure. + +--- ## Use with AI coding agents -Ghax is a plain CLI. Any agent that can run shell commands can drive -a browser through it. Setup below is per-agent; the capabilities are -the same everywhere. +Ghax is a plain CLI. Any agent that can run shell commands can drive a browser through it. ### Claude Code -The repo ships two skills under [`.claude/skills/`](./.claude/skills/): +Two skills ship under [.claude/skills/](./.claude/skills/): -- [`ghax.md`](./.claude/skills/ghax.md) — top-level router. Claude - picks it up when you say "attach to my browser", "test the - extension", "snapshot the dashboard", and routes to the right - sub-skill. -- [`ghax-browse.md`](./.claude/skills/ghax-browse.md) — the flagship - skill with full workflow examples: QA runs, extension hot-reload, - SaaS-dashboard automation, snapshot-interact-assert loops. +- [`ghax.md`](./.claude/skills/ghax.md) — top-level router. Claude picks it up when you say *"attach to my browser"*, *"test the extension"*, *"snapshot the dashboard"*. +- [`ghax-browse.md`](./.claude/skills/ghax-browse.md) — flagship skill with full workflow examples (QA, hot-reload, SaaS automation, snapshot-interact-assert loops). -**Install the skills** (one of three ways): +Install them: ```bash -# 1. User-global — available in every Claude Code session on this machine +# Option 1: user-global (available in every Claude Code session) mkdir -p ~/.claude/skills cp .claude/skills/ghax.md .claude/skills/ghax-browse.md ~/.claude/skills/ -# 2. Project-local — available only in this repo -# (They already live here. Claude Code picks up .claude/skills/*.md -# automatically when it opens this directory.) +# Option 2: project-local (auto-discovered when Claude opens this repo) +# They already live here — nothing to do. -# 3. Symlink, so future skill updates flow in without copying +# Option 3: symlink so future updates flow in ln -s "$PWD/.claude/skills/ghax.md" ~/.claude/skills/ghax.md ln -s "$PWD/.claude/skills/ghax-browse.md" ~/.claude/skills/ghax-browse.md ``` -**Verify:** - -```bash -claude /skills # should list `ghax` and `ghax-browse` -``` +Verify: `claude /skills` should list `ghax` and `ghax-browse`. -Then just talk to Claude naturally: *"attach to my edge and snapshot -the settings page"*, *"hot-reload the extension I just built"*, *"QA -my staging deploy and screenshot any console errors"*. The skills -describe the right ghax commands for each intent. +### Codex · Cursor · Aider · Continue · Windsurf -### Codex (OpenAI) / Cursor / Aider / Continue / any shell-driving agent +These agents read a project memory file. For Codex and Cursor, the convention is `AGENTS.md` at the repo root. This repo already ships one: [AGENTS.md](./AGENTS.md). For Cursor specifically you can also symlink to `.cursorrules`, for Aider to `CONVENTIONS.md`. -These agents don't have a plug-in skill system like Claude Code. The -clean integration is a project-level `AGENTS.md` (or equivalent -memory file — Cursor reads `.cursorrules`, Aider reads -`CONVENTIONS.md`, etc.) that tells the agent when to use ghax. - -**Drop this into your project's `AGENTS.md` / `.cursorrules`:** +If you're adding ghax to **another project**, drop this snippet into that project's `AGENTS.md`: ```markdown ## Browser automation: use ghax -When a task requires a real browser (QA, dashboard automation, -extension testing, anything behind SSO), use the `ghax` CLI instead -of asking the user to paste screenshots or step through clicks. +When a task requires a real browser (QA, dashboard automation, extension +testing, anything behind SSO), use the `ghax` CLI instead of asking the +user for screenshots or step-by-step clicks. -Three patterns cover 95% of use cases: +Three patterns cover almost every case: 1. Attach: - ghax attach # scans :9222-9230 for a running browser - ghax attach --launch # or spawn a scratch Chromium + ghax attach # scans :9222-9230 for a running browser + ghax attach --launch # or spawn a scratch browser 2. Snapshot-then-interact (refs survive until next snapshot): ghax snapshot -i --json @@ -235,64 +202,48 @@ Three patterns cover 95% of use cases: 3. One-round-trip batch (auto re-snapshots between ref-using steps): ghax batch '[{"cmd":"click","args":["@e7"]}, …]' -Add `--json` to any command for machine-readable output. -Full command surface: run `ghax --help`. +Full surface: `ghax --help`. JSON on any verb with `--json`. ``` -That's the integration. The agent reads the memory file at session -start, knows ghax exists, and reaches for it when appropriate. - -### No-memory / raw-shell agents +### Raw-shell / scripted harnesses -If you're scripting against an agent with no memory file -(one-off API calls, custom harnesses), pass the same three-pattern -brief as the system prompt. Ghax exits with standard codes (`0` ok, -`2` not attached, `4` CDP error) and every verb takes `--json` — it -behaves like any well-formed Unix CLI under an agent. +Ghax is a well-formed Unix CLI: documented exit codes (`0` ok, `2` not attached, `4` CDP error), `--json` on every verb, stable argv. Script against it the same way you'd script against `curl` or `rg`. See `ghax --help` for the full interface. ### Multi-agent on one browser Two agents, one browser, zero stepping on each other: ```bash -# Agent A -GHAX_STATE_FILE=/tmp/ghax-a.json ghax attach -GHAX_STATE_FILE=/tmp/ghax-a.json ghax new-window https://app-a.com +# Agent A shell +export GHAX_STATE_FILE=/tmp/ghax-a.json +ghax attach +ghax new-window https://app-a.com -# Agent B (separate shell) -GHAX_STATE_FILE=/tmp/ghax-b.json ghax attach -GHAX_STATE_FILE=/tmp/ghax-b.json ghax new-window https://app-b.com +# Agent B shell +export GHAX_STATE_FILE=/tmp/ghax-b.json +ghax attach +ghax new-window https://app-b.com ``` -Same browser, same profile, same auth. Different windows and separate -daemon state. Neither agent sees the other's active-tab pointer. - -## When to reach for ghax - -- You're running an AI agent against a SaaS dashboard behind SSO. - Fresh-browser tools break on login. Ghax uses the session you - already have. -- You're developing a Chrome extension and want `pnpm build` to hot- - reload your service worker + content scripts without losing tab - state. No other tool does this. -- You want Core Web Vitals on your real app with your real user - profile, not a headless clean-room. -- You're QAing a deploy and need screenshots + console errors + - failed-request list in one report. `ghax qa --url ` does the - whole thing. -- You need to automate a dashboard that actively refuses headless - browsers but you still want CI-style repeatability — attach to a - real visible browser locally, drive it the same way an agent would - in prod. -- You want a clean-room disposable browser for CI. `ghax attach - --launch --headless` gives you one in its own scratch profile. - -## When not to reach for ghax - -- You need cross-browser testing on Firefox or Safari. Ghax is CDP- - only — Chrome family (Edge, Chrome, Chromium, Brave, Arc). -- You want codegen from a UI recorder. Ghax records into its own JSON - format for replay, not for generating test code. +Same browser process, same profile, same auth. Different windows and separate daemon state. Neither agent sees the other's active-tab pointer. + +--- + +## When ghax is the right call + +- You're running an AI agent against a SaaS dashboard behind SSO. Fresh-browser tools break on login; ghax uses the session you already have. +- You're developing a Chrome extension and want `pnpm build` to hot-reload your service worker and content scripts without losing tab state. +- You want Core Web Vitals on your real app with your real user profile, not a headless clean-room. +- You need screenshots + console errors + failed requests in one deploy-verify report. `ghax qa --url ` does the whole thing in one shot. +- You need to automate a site that refuses headless browsers, but you still want CI-style repeatability. + +## When ghax is the wrong call + +- You need cross-browser testing on Firefox or Safari. Ghax is CDP-only — Chromium family only (Edge, Chrome, Chromium, Brave, Arc). +- You want UI-recorder codegen. Ghax records into its own JSON format for replay, not for generating test code. +- You need a fully isolated clean-room browser. Use `ghax attach --launch --headless` for disposable runs, but recognize that it still uses your system Chromium. + +--- ## Architecture @@ -307,37 +258,36 @@ ghax daemon (Node ESM bundle, ~80 KB) Your running Chrome / Edge (--remote-debugging-port=9222) ``` -The CLI is a thin HTTP client so the binary stays small. The daemon -owns every CDP session and auto-shuts after 30 minutes idle. Full -notes in [ARCHITECTURE.md](./ARCHITECTURE.md). +The CLI is a thin HTTP client so the binary stays small. The daemon owns every CDP session and auto-shuts after 30 minutes idle. Deeper notes: [ARCHITECTURE.md](./ARCHITECTURE.md). + +--- ## Security -The daemon binds to `127.0.0.1` only. No auth token — this is a -single-user, localhost tool. State lives in `.ghax/` relative to the -current git root, or `~/.ghax/` with `GHAX_GLOBAL=1`. +The daemon binds to `127.0.0.1` only. No auth token — this is a single-user localhost tool. State lives in `.ghax/` relative to the current git root, or `~/.ghax/` with `GHAX_GLOBAL=1`. -`chrome.storage.local` often contains auth tokens. Treat -`ghax ext storage` output like `localStorage.getItem` — don't paste -it into chat. +`chrome.storage.local` often contains auth tokens. Treat `ghax ext storage` output like `localStorage.getItem` — don't paste it into chat context, commit messages, or logs. -See [SECURITY.md](./SECURITY.md) for the threat model and disclosure -process. +Full threat model: [SECURITY.md](./SECURITY.md). + +--- ## Contributing -Issues and PRs welcome. Start with [CONTRIBUTING.md](./CONTRIBUTING.md) -— it covers the Rust + Node split, the 95-check live-browser smoke -suite, and the hard invariants that'll bite you if you skip them. +Issues and PRs welcome. Start with [CONTRIBUTING.md](./CONTRIBUTING.md) — it covers the Rust + Node split, the 95-check live-browser smoke suite, and the hard invariants. If you're a coding agent working on the repo, read [AGENTS.md](./AGENTS.md) first. + +This project follows the [Contributor Covenant v2.1](./CODE_OF_CONDUCT.md). + +--- ## License -MIT. Portions adapted from [gstack](https://github.com/garrytan/gstack) -by Garry Tan (also MIT): `buffers.ts`, `config.ts`, and the -accessibility-snapshot algorithm in `snapshot.ts`. +MIT. See [LICENSE](./LICENSE). + +Portions adapted from [gstack](https://github.com/garrytan/gstack) by Garry Tan (also MIT): `buffers.ts`, `config.ts`, and the accessibility-snapshot algorithm in `snapshot.ts`. + +--- ## Credits -Shaped by months of running AI agents against real dashboards and -logging every papercut. Field reports in -[`docs/sessions/`](./docs/sessions/) if you want the receipts. +Shaped by months of running AI agents against real browsers and logging every papercut. Field reports in [docs/sessions/](./docs/sessions/) for the receipts. diff --git a/llms.txt b/llms.txt new file mode 100644 index 0000000..c8601a1 --- /dev/null +++ b/llms.txt @@ -0,0 +1,126 @@ +# ghax + +> A CLI that attaches to your real running Chrome or Edge via Chrome DevTools Protocol. Drive tabs, snapshot the accessibility tree, click and fill by role-name refs, work with MV3 extension internals, capture console and network traffic. No separate browser to install. Purpose-built for AI coding agents that need to drive a real browser session. + +ghax reaches into a Chromium-family browser the user already has running (with their SSO cookies, extensions, and open tabs) instead of launching a disposable one. Three modes: attach to the real profile, attach to a dedicated ghax profile, or launch a scratch profile (add `--headless` for CI). The CLI is Rust (~3 MB, ~20 ms cold start); the daemon is a Node bundle that speaks CDP. + +This file is written for AI coding agents. If you just arrived and the user asked you to install ghax, run the steps in "Install from source" top-to-bottom, verify, then ask the user what browser task they want done. + +## Install from source + +Prerequisites (check these first): + +```bash +node --version # need v20.0 or newer +cargo --version # need Rust 1.80 or newer +git --version # any recent version +``` + +If any are missing, tell the user which one and stop. Do not attempt to install system packages without the user's permission. + +Build and install: + +```bash +npm install # installs esbuild, tsx, typescript, playwright +npm run build:all # compiles Rust CLI + bundles Node daemon +npm run install-link # symlinks target/release/ghax → ~/.local/bin/ghax +``` + +`install-link` also drops the daemon bundle at `~/.local/share/ghax/ghax-daemon.mjs` and bootstraps its node_modules (playwright + source-map). All idempotent, safe to re-run. + +Ensure `~/.local/bin` is on PATH. If `which ghax` comes back empty, tell the user to add this to their shell rc: + +```bash +export PATH="$HOME/.local/bin:$PATH" +``` + +Verify the install: + +```bash +ghax --version # should print "ghax 0.4.x" +ghax --help # should print the full command surface +``` + +If both succeed, report to the user: "ghax installed. Run `ghax --help` to see the full surface." + +## First-time usage pattern + +ghax attaches to a Chrome or Edge running with `--remote-debugging-port=9222`. If the user doesn't have one running, either tell them to launch one: + +```bash +# macOS Edge — most common setup +"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" --remote-debugging-port=9222 & + +# macOS Chrome (v113+ requires an explicit profile path) +"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + --remote-debugging-port=9222 \ + --user-data-dir="$HOME/.config/chrome-ghax" & +``` + +Or offer to launch a scratch browser for them: + +```bash +ghax attach --launch # spawns a scratch Chromium with CDP +ghax attach --launch --headless # no visible window (CI-style) +``` + +Then attach and drive: + +```bash +ghax attach # scans :9222-9230, attaches to what's running +ghax tabs --json # list open tabs +ghax goto https://example.com +ghax snapshot -i --json # aria tree with @e refs +ghax click @e3 # refs survive until next snapshot +ghax fill @e5 "hello" +ghax screenshot --path /tmp/shot.png +``` + +Refs (@e1, @e2, ...) come from `snapshot -i` and are only valid until the next snapshot on the same tab. If you mutate the DOM (click anything, navigate, etc.), take a fresh snapshot before reusing refs. The `ghax batch` verb handles this automatically between steps: + +```bash +ghax batch '[ + {"cmd":"goto","args":["https://app.example.com"]}, + {"cmd":"snapshot","opts":{"interactive":true}}, + {"cmd":"click","args":["@e7"]}, + {"cmd":"fill","args":["@e9","new-value"]} +]' +``` + +## Command reference + +Ghax has no man pages. `ghax --help` prints the full command surface — currently 71 verbs covering attach/detach, tab operations, navigation, snapshot + interact, file uploads, real user gestures, MV3 extension internals (service workers, popups, options, side panels, chrome.storage), console + network capture, Core Web Vitals, screenshots, XPath, live CSS injection, batch execution, recording + replay, and orchestrated flows (qa, canary, perf, profile, ship, review). + +Add `--json` to any command for machine-readable output. Exit codes: `0` ok, `1` usage error, `2` not attached, `4` CDP error, `10` build/bootstrap failure. + +For specific task patterns: + +```bash +ghax --help | less # browse the full surface +ghax attach --help # per-verb help +ghax ext --help # extension sub-commands +``` + +## Key files for deeper context + +- [README.md](./README.md): human-facing overview with use cases and examples. +- [AGENTS.md](./AGENTS.md): generic agent memory — invariants, build commands, how to add a verb. +- [CLAUDE.md](./CLAUDE.md): Claude Code specific instructions and workflow examples. +- [ARCHITECTURE.md](./ARCHITECTURE.md): CLI/daemon split, CDP connection model, ref resolution. +- [CONTRIBUTING.md](./CONTRIBUTING.md): dev loop, code style, smoke suite, PR conventions. +- [SECURITY.md](./SECURITY.md): threat model (local-only daemon, no auth token, rationale). +- [CHANGELOG.md](./CHANGELOG.md): per-version changes. + +## Optional + +- [.claude/skills/ghax.md](./.claude/skills/ghax.md): Claude Code skill (router) — auto-discovered when Claude Code opens this repo. +- [.claude/skills/ghax-browse.md](./.claude/skills/ghax-browse.md): Claude Code skill (flagship) — full workflow examples. +- [docs/BENCHMARK.md](./docs/BENCHMARK.md): latency numbers vs other CLIs. +- [docs/sessions/](./docs/sessions/): field reports from production agent runs (useful for understanding edge cases). + +## Things not to do + +- Do not attempt to run the smoke suite (`npm run test:smoke`) without first confirming with the user — it drives a real browser for ~30 seconds and requires an Edge/Chrome running on port 9222. +- Do not modify `Cargo.toml` version or push tags without the user's explicit request — releases are gated by `scripts/release.sh`. +- Do not run destructive git operations (force-push, reset --hard, branch deletions on main). +- The daemon auto-shuts after 30 minutes idle. If commands start failing with "not attached", run `ghax attach` again.