Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,81 @@ Format inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).

### Added

- **Bucket B — architectural fixes** (sourced from the 2026-04-20
jnremache field report):
- `ghax batch '<json-array>'` — one-round-trip sequence executor
(TOK-09). Unlike `chain` (stdin, N round-trips), `batch` parses
the inline JSON client-side, ships the whole plan in a single
RPC, and **auto-re-snapshots between steps that reference
`@e<n>` refs** so the ref map always resolves against the
current DOM. That directly fixes the JNR-03 mid-sequence ref-
shift pattern observed on Material / React forms (comboboxes
opening mid-plan and reindexing the ARIA tree). Opt out of the
auto-snapshot with `--no-auto-snapshot`; `--no-stopOnError`
keeps running past a failed step. Results always emit as JSON.
- `snapshot` is now **dialog-aware by default** (JNR-06). When an
open modal is present (`[role=dialog]`, `[role=alertdialog]`,
native `<dialog open>`, or `[aria-modal=true]`), the walker
treats the top-most visible modal as the new root — so the
outer app's `aria-hidden="true"` no longer swallows every
interactive element inside the modal. Fall back to the old
body-rooted behavior with `--no-dialog-scope`.
- `fill` expands the framework-safe path to cover Angular and
Material (JNR-04). React's native-setter + `input` pattern was
already there; now the handler also dispatches `blur` (so
Angular's `FormControl.markAsTouched` runs and pristine/dirty
validators fire) and handles `contenteditable` hosts (Material
chip inputs, rich editors) via `textContent` + a proper
`InputEvent('insertText')`.
- `state.rs::require_daemon` gives a more actionable message when
state is stale (JNR-01): if a ghax daemon is alive on the
9222–9230 scan range but our state file is missing, the "no
daemon state" error now hints at the live port and says
`ghax attach` will re-pair with it; the pid-mismatch branch
spells out `ghax detach && ghax attach` as the fix.

- **Bucket C papercut bundle** — five quality-of-life fixes for LLM
operators driving ghax (sourced from the 2026-04-20 jnremache field
report):
- `ghax attach` is now silent on fresh success (POSIX convention).
Pass `--verbose` or set `GHAX_VERBOSE=1` to restore the
`attached — pid / port / browser` one-liner. `already attached`
keeps printing because that's informational, not success.
- `ghax status` surfaces the active tab id + first 60 chars of its
title as a new `active` row — matters most in multi-agent sessions
where `new-window` parked the agent on a non-obvious tab.
`status --json` gains `activeTabId`, `activeTabTitle`,
`activeTabUrl` fields alongside the existing counts.
- `ghax eval` auto-retries once past a navigation-in-flight
(`Execution context was destroyed` / `Target closed` / frame
detached). The daemon waits up to 3s for the next `load` event
and re-issues the evaluate — matches what a human would do
manually with `wait --load && eval …`.
- Rust CLI's RPC client single-retries transient transport errors
(connection refused/reset/timeout) after a 50 ms pause, so a
daemon that briefly blinks (post-spawn warm-up, GC pause, hot
reload) doesn't bubble up a user-visible failure. Semantic
errors (daemon answered with `ok: false`) are not retried — those
are real command failures, not flake.
- `ghax --help` splits the overloaded `wait` line into three:
`wait <selector>` (most common), `wait <ms>`, and
`wait --networkidle | --load`. `eval` gains a `# auto-retries
once past a nav-in-flight` inline note. `attach` lists
`[--verbose]`.

### Docs

- **Known browser quirks** section in `CONTRIBUTING.md` covers two
not-a-ghax-bug patterns that surface when driving a real browser:
Chrome 113+ ignores `--remote-debugging-port` on the default
user-data-dir (fix: pass `--user-data-dir=<path>`); and Google's
anti-bot on sensitive pages refuses to render when
`navigator.webdriver` is set (mitigation: launch with
`--disable-blink-features=AutomationControlled`; for flows where
even that fails, detach / do the step manually / re-attach).

### Added

- `ghax xpath <expression> [--limit N]` — query the page's DOM with an
XPath expression, return every matching element with its tag, text
preview, and bounding box. XPath is also usable via Playwright's
Expand Down
56 changes: 56 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -159,6 +159,62 @@ QA that needs a dedicated fixture rather than the real web.
it before and after.
5. Updated `CHANGELOG.md` under `## [Unreleased]`.

## Known browser quirks

These are not ghax bugs — they're browser / site behaviors that surface
when driving a real browser over CDP. Document them here so the next
person doesn't re-discover them.

### Chrome v113+ refuses CDP on the default profile

As of Chrome 113, `--remote-debugging-port` is ignored when the browser
is using the default `--user-data-dir`. Launching Chrome without an
explicit profile path silently opens DevTools-less — `ghax attach`
will fail to find the `/json/version` endpoint.

Workaround: point at a writable profile directory.

```bash
# Chrome — explicit profile
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \
--remote-debugging-port=9222 \
--user-data-dir="$HOME/.config/chrome-ghax" &
```

Edge is not affected (still honors CDP on its default profile as of
2026-Q1). If you want Edge + a clean profile anyway, the same
`--user-data-dir=<path>` flag works.

### Google anti-bot on sensitive flows

Chrome / Edge launched with `--remote-debugging-port` sets
`navigator.webdriver = true` plus a few related fingerprintable flags.
Google's anti-bot on sensitive pages (Business Profile verification,
Drive sharing consent, some OAuth challenges, Google Ads campaign
edits) refuses to render, throws a "disconnected" modal, or logs you
out mid-flow.

Cheap mitigation — add `--disable-blink-features=AutomationControlled`
to the launch command:

```bash
"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" \
--remote-debugging-port=9222 \
--disable-blink-features=AutomationControlled &
```

This clears the `navigator.webdriver` bit and unblocks most flows. It
won't defeat determined server-side fingerprinting — for flows where
even the mitigation fails (e.g. rapid form submits on Google Ads that
trigger a "session disconnected" modal), the documented pattern is:

1. `ghax detach`
2. Do the Google-specific step manually in the browser.
3. `ghax attach` and resume.

Full stealth-mode JS injection is explicitly out of scope — cat-and-
mouse maintenance isn't worth it for a dev tool.

## Reporting issues

Include:
Expand Down
15 changes: 11 additions & 4 deletions crates/cli/src/attach.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1053,10 +1053,17 @@ pub fn cmd_attach(parsed: &Parsed, cfg: &Config) -> Result<i32> {

let ep = endpoint.unwrap(); // always Some at this point
let state = spawn_daemon(cfg, &ep, &kind, capture_bodies_ref)?;
println!(
"attached — pid {}, port {}, browser {}",
state.pid, state.port, state.browser_kind
);
// POSIX convention — stay quiet on fresh success. `--verbose` restores
// the pid/port/browser one-liner for humans; the `already attached`
// branch above still prints because that's informational, not success.
let verbose = matches!(parsed.flags.get("verbose"), Some(serde_json::Value::Bool(true)))
|| std::env::var("GHAX_VERBOSE").is_ok();
if verbose {
println!(
"attached — pid {}, port {}, browser {}",
state.pid, state.port, state.browser_kind
);
}
Ok(EXIT_OK)
}

Expand Down
1 change: 1 addition & 0 deletions crates/cli/src/dispatch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ fn dispatch_inner(cfg: &Config, verb: &str, rest: &[String]) -> Result<i32> {
"pair" => return small::cmd_pair(rest),
"diff-state" => return small::cmd_diff_state(rest),
"chain" => return small::cmd_chain(rest),
"batch" => return small::cmd_batch(rest),
"replay" => return small::cmd_replay(rest),
"gif" => return small::cmd_gif(rest),
"qa" => return qa::cmd_qa(&args::parse(rest)),
Expand Down
11 changes: 8 additions & 3 deletions crates/cli/src/help.rs
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,12 @@ pub const HELP: &str = r#"ghax — attach to your real Chrome/Edge via CDP and d
Connection:
attach [--port <n>] [--browser edge|chrome|chromium|brave|arc] [--launch]
[--headless] [--load-extension <path>] [--data-dir <path>]
[--capture-bodies[=<url-glob>]]
[--capture-bodies[=<url-glob>]] [--verbose]
# Without --port, scans :9222-9230. Multiple running → picker.
# With --launch and no --port, auto-picks first free port in range.
# --capture-bodies records JSON/text response bodies (opt-in,
# 32KB cap per body). Glob filters by URL (e.g. '*/api/*').
# --verbose prints pid/port/browser on success (default: silent).
status [--json]
detach
restart
Expand All @@ -22,7 +23,7 @@ Tab:
new-window [url] # new background window, same profile
goto <url>
back | forward | reload
eval <js>
eval <js> # auto-retries once past a nav-in-flight
try [<js>] [--css <rules>] [--selector <sel>] [--measure <expr>] [--shot <path>]
text
html [<selector>]
Expand All @@ -35,7 +36,9 @@ Snapshot & interact:
upload <@ref|selector> <path>[,<path>…] # wraps setInputFiles
press <key>
type <text>
wait <selector|ms|--networkidle|--load>
wait <selector> # wait until selector appears (most common)
wait <ms> # fixed delay in milliseconds
wait --networkidle | --load # wait for a navigation event
viewport <WxH>
responsive [prefix] [--fullPage]
diff <url1> <url2>
Expand Down Expand Up @@ -72,6 +75,8 @@ Real user gestures:

Batch / recording:
chain < steps.json (JSON array of {cmd, args?, opts?})
batch '<json-array>' (one round-trip; auto re-snapshots between
steps that use @e<n> refs)
record start [name]
record stop
record status
Expand Down
37 changes: 36 additions & 1 deletion crates/cli/src/rpc.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,28 @@ impl std::fmt::Display for RpcError {
impl std::error::Error for RpcError {}

pub fn call(port: u16, cmd: &str, args: Value, opts: Value) -> Result<Value> {
// Single-retry shim for transient-looking errors — connection
// refused/reset, broken pipe, request build failure — so a daemon
// that's briefly unresponsive (post-spawn warm-up, GC pause,
// mid-reload) doesn't bubble up a user-visible failure. Semantic
// errors (daemon answered with ok:false) are NOT retried — those
// are real command failures, not flake.
match call_once(port, cmd, &args, &opts) {
Ok(v) => Ok(v),
Err(e) => {
if is_transient(&e) {
std::thread::sleep(std::time::Duration::from_millis(50));
call_once(port, cmd, &args, &opts)
} else {
Err(e)
}
}
}
}

fn call_once(port: u16, cmd: &str, args: &Value, opts: &Value) -> Result<Value> {
let url = format!("http://127.0.0.1:{port}/rpc");
let body = Request { cmd, args: &args, opts: &opts };
let body = Request { cmd, args, opts };
let client = reqwest::blocking::Client::builder()
// No global timeout: long verbs (qa, perf, snapshot with --wait) can run for minutes.
.build()?;
Expand All @@ -49,3 +69,18 @@ pub fn call(port: u16, cmd: &str, args: Value, opts: Value) -> Result<Value> {
}
Ok(envelope.get("data").cloned().unwrap_or(Value::Null))
}

/// Transient = transport-layer hiccup we'd retry. A daemon-side semantic
/// failure (wrapped in `RpcError`) is never transient — it ran, it failed.
fn is_transient(err: &anyhow::Error) -> bool {
if err.downcast_ref::<RpcError>().is_some() {
return false;
}
if let Some(re) = err.downcast_ref::<reqwest::Error>() {
// Connection refused / reset / broken pipe / timeout all look
// like the daemon blinked. `is_request` catches everything except
// a completed response.
return re.is_connect() || re.is_timeout() || re.is_request();
}
false
}
63 changes: 63 additions & 0 deletions crates/cli/src/small.rs
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,18 @@ pub fn cmd_status(rest: &[String]) -> Result<i32> {
println!("attached {} ({})", daemon_state.browser_kind, browser_url_short);
println!("daemon pid {}, port {}, up {}m", daemon_state.pid, daemon_state.port, up_min);
println!("tabs {}", data.get("tabCount").and_then(|v| v.as_u64()).unwrap_or(0));
// Surface the active tab so operators can sanity-check which page
// they're about to drive before issuing clicks / fills. Silently
// skipped if the daemon didn't send one (older daemon on new CLI).
if let Some(title) = data.get("activeTabTitle").and_then(|v| v.as_str()) {
let url = data.get("activeTabUrl").and_then(|v| v.as_str()).unwrap_or("");
let id = data.get("activeTabId").and_then(|v| v.as_str()).unwrap_or("");
if !id.is_empty() {
let title_trim = title.chars().take(60).collect::<String>();
let label = if title_trim.is_empty() { url.to_string() } else { title_trim };
println!("active {} — {}", id, label);
}
}
println!("targets {}", data.get("targetCount").and_then(|v| v.as_u64()).unwrap_or(0));
println!("extensions {}", data.get("extensionCount").and_then(|v| v.as_u64()).unwrap_or(0));
println!("cwd {}", daemon_state.cwd);
Expand Down Expand Up @@ -416,6 +428,57 @@ pub fn cmd_chain(rest: &[String]) -> Result<i32> {
Ok(if any_failed { EXIT_CDP_ERROR } else { EXIT_OK })
}

// ─── batch ───────────────────────────────────────────────────────────────────

/// `ghax batch '<json-array>'` — one-round-trip sequence executor.
///
/// Unlike `chain` (which reads stdin and does N round-trips), `batch` parses
/// the positional JSON argument client-side, ships the whole plan in one RPC,
/// and re-snapshots between steps that reference `@e<n>` refs. That fixes
/// the mid-sequence ref-shift on framework-heavy forms where the ARIA tree
/// reindexes mid-plan.
pub fn cmd_batch(rest: &[String]) -> Result<i32> {
let parsed = args::parse(rest);
let Some(json_src) = parsed.positional.first() else {
eprintln!("Usage: ghax batch '[{{\"cmd\":\"click\",\"args\":[\"@e7\"]}}, …]'");
return Ok(EXIT_USAGE);
};
let steps: Value = match serde_json::from_str(json_src) {
Ok(v) => v,
Err(e) => {
eprintln!("ghax batch: invalid JSON — {e}");
return Ok(EXIT_USAGE);
}
};
if !matches!(&steps, Value::Array(_)) {
eprintln!("ghax batch: expected a top-level JSON array of {{cmd, args?, opts?}} steps");
return Ok(EXIT_USAGE);
}

let cfg = state::resolve_config();
let port = match state::require_daemon(&cfg) {
Ok(p) => p,
Err(e) => {
eprintln!("ghax: {e}");
return Ok(EXIT_NOT_ATTACHED);
}
};

// The daemon handler reads `args[0]` as the step array.
let args_payload = Value::Array(vec![steps]);
let data = rpc::call(port, "batch", args_payload, parsed.opts_without_json())?;
// `batch` results are always JSON — printing them any other way
// would defeat the machine-readability that motivates the verb.
output::print(&data, true);

// Exit non-zero if any step failed, mirroring `chain`.
let any_failed = data
.as_array()
.map(|arr| arr.iter().any(|r| r.get("ok").and_then(|v| v.as_bool()) != Some(true)))
.unwrap_or(false);
Ok(if any_failed { EXIT_CDP_ERROR } else { EXIT_OK })
}

// ─── replay ──────────────────────────────────────────────────────────────────

/// `ghax replay <file>` — mirrors `cmdReplay`.
Expand Down
Loading
Loading