diff --git a/CHANGELOG.md b/CHANGELOG.md index ef24d73..358d7e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -128,6 +128,81 @@ Format inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/). ### Added +- **Bucket B — architectural fixes** (sourced from the 2026-04-20 + jnremache field report): + - `ghax batch ''` — one-round-trip sequence executor + (TOK-09). Unlike `chain` (stdin, N round-trips), `batch` parses + the inline JSON client-side, ships the whole plan in a single + RPC, and **auto-re-snapshots between steps that reference + `@e` refs** so the ref map always resolves against the + current DOM. That directly fixes the JNR-03 mid-sequence ref- + shift pattern observed on Material / React forms (comboboxes + opening mid-plan and reindexing the ARIA tree). Opt out of the + auto-snapshot with `--no-auto-snapshot`; `--no-stopOnError` + keeps running past a failed step. Results always emit as JSON. + - `snapshot` is now **dialog-aware by default** (JNR-06). When an + open modal is present (`[role=dialog]`, `[role=alertdialog]`, + native ``, or `[aria-modal=true]`), the walker + treats the top-most visible modal as the new root — so the + outer app's `aria-hidden="true"` no longer swallows every + interactive element inside the modal. Fall back to the old + body-rooted behavior with `--no-dialog-scope`. + - `fill` expands the framework-safe path to cover Angular and + Material (JNR-04). React's native-setter + `input` pattern was + already there; now the handler also dispatches `blur` (so + Angular's `FormControl.markAsTouched` runs and pristine/dirty + validators fire) and handles `contenteditable` hosts (Material + chip inputs, rich editors) via `textContent` + a proper + `InputEvent('insertText')`. + - `state.rs::require_daemon` gives a more actionable message when + state is stale (JNR-01): if a ghax daemon is alive on the + 9222–9230 scan range but our state file is missing, the "no + daemon state" error now hints at the live port and says + `ghax attach` will re-pair with it; the pid-mismatch branch + spells out `ghax detach && ghax attach` as the fix. + +- **Bucket C papercut bundle** — five quality-of-life fixes for LLM + operators driving ghax (sourced from the 2026-04-20 jnremache field + report): + - `ghax attach` is now silent on fresh success (POSIX convention). + Pass `--verbose` or set `GHAX_VERBOSE=1` to restore the + `attached — pid / port / browser` one-liner. `already attached` + keeps printing because that's informational, not success. + - `ghax status` surfaces the active tab id + first 60 chars of its + title as a new `active` row — matters most in multi-agent sessions + where `new-window` parked the agent on a non-obvious tab. + `status --json` gains `activeTabId`, `activeTabTitle`, + `activeTabUrl` fields alongside the existing counts. + - `ghax eval` auto-retries once past a navigation-in-flight + (`Execution context was destroyed` / `Target closed` / frame + detached). The daemon waits up to 3s for the next `load` event + and re-issues the evaluate — matches what a human would do + manually with `wait --load && eval …`. + - Rust CLI's RPC client single-retries transient transport errors + (connection refused/reset/timeout) after a 50 ms pause, so a + daemon that briefly blinks (post-spawn warm-up, GC pause, hot + reload) doesn't bubble up a user-visible failure. Semantic + errors (daemon answered with `ok: false`) are not retried — those + are real command failures, not flake. + - `ghax --help` splits the overloaded `wait` line into three: + `wait ` (most common), `wait `, and + `wait --networkidle | --load`. `eval` gains a `# auto-retries + once past a nav-in-flight` inline note. `attach` lists + `[--verbose]`. + +### Docs + +- **Known browser quirks** section in `CONTRIBUTING.md` covers two + not-a-ghax-bug patterns that surface when driving a real browser: + Chrome 113+ ignores `--remote-debugging-port` on the default + user-data-dir (fix: pass `--user-data-dir=`); and Google's + anti-bot on sensitive pages refuses to render when + `navigator.webdriver` is set (mitigation: launch with + `--disable-blink-features=AutomationControlled`; for flows where + even that fails, detach / do the step manually / re-attach). + +### Added + - `ghax xpath [--limit N]` — query the page's DOM with an XPath expression, return every matching element with its tag, text preview, and bounding box. XPath is also usable via Playwright's diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 13fc5a2..cf43130 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -159,6 +159,62 @@ QA that needs a dedicated fixture rather than the real web. it before and after. 5. Updated `CHANGELOG.md` under `## [Unreleased]`. +## Known browser quirks + +These are not ghax bugs — they're browser / site behaviors that surface +when driving a real browser over CDP. Document them here so the next +person doesn't re-discover them. + +### Chrome v113+ refuses CDP on the default profile + +As of Chrome 113, `--remote-debugging-port` is ignored when the browser +is using the default `--user-data-dir`. Launching Chrome without an +explicit profile path silently opens DevTools-less — `ghax attach` +will fail to find the `/json/version` endpoint. + +Workaround: point at a writable profile directory. + +```bash +# Chrome — explicit profile +"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" \ + --remote-debugging-port=9222 \ + --user-data-dir="$HOME/.config/chrome-ghax" & +``` + +Edge is not affected (still honors CDP on its default profile as of +2026-Q1). If you want Edge + a clean profile anyway, the same +`--user-data-dir=` flag works. + +### Google anti-bot on sensitive flows + +Chrome / Edge launched with `--remote-debugging-port` sets +`navigator.webdriver = true` plus a few related fingerprintable flags. +Google's anti-bot on sensitive pages (Business Profile verification, +Drive sharing consent, some OAuth challenges, Google Ads campaign +edits) refuses to render, throws a "disconnected" modal, or logs you +out mid-flow. + +Cheap mitigation — add `--disable-blink-features=AutomationControlled` +to the launch command: + +```bash +"/Applications/Microsoft Edge.app/Contents/MacOS/Microsoft Edge" \ + --remote-debugging-port=9222 \ + --disable-blink-features=AutomationControlled & +``` + +This clears the `navigator.webdriver` bit and unblocks most flows. It +won't defeat determined server-side fingerprinting — for flows where +even the mitigation fails (e.g. rapid form submits on Google Ads that +trigger a "session disconnected" modal), the documented pattern is: + +1. `ghax detach` +2. Do the Google-specific step manually in the browser. +3. `ghax attach` and resume. + +Full stealth-mode JS injection is explicitly out of scope — cat-and- +mouse maintenance isn't worth it for a dev tool. + ## Reporting issues Include: diff --git a/crates/cli/src/attach.rs b/crates/cli/src/attach.rs index 2c739df..37394bd 100644 --- a/crates/cli/src/attach.rs +++ b/crates/cli/src/attach.rs @@ -1053,10 +1053,17 @@ pub fn cmd_attach(parsed: &Parsed, cfg: &Config) -> Result { let ep = endpoint.unwrap(); // always Some at this point let state = spawn_daemon(cfg, &ep, &kind, capture_bodies_ref)?; - println!( - "attached — pid {}, port {}, browser {}", - state.pid, state.port, state.browser_kind - ); + // POSIX convention — stay quiet on fresh success. `--verbose` restores + // the pid/port/browser one-liner for humans; the `already attached` + // branch above still prints because that's informational, not success. + let verbose = matches!(parsed.flags.get("verbose"), Some(serde_json::Value::Bool(true))) + || std::env::var("GHAX_VERBOSE").is_ok(); + if verbose { + println!( + "attached — pid {}, port {}, browser {}", + state.pid, state.port, state.browser_kind + ); + } Ok(EXIT_OK) } diff --git a/crates/cli/src/dispatch.rs b/crates/cli/src/dispatch.rs index b49c2ec..540512f 100644 --- a/crates/cli/src/dispatch.rs +++ b/crates/cli/src/dispatch.rs @@ -46,6 +46,7 @@ fn dispatch_inner(cfg: &Config, verb: &str, rest: &[String]) -> Result { "pair" => return small::cmd_pair(rest), "diff-state" => return small::cmd_diff_state(rest), "chain" => return small::cmd_chain(rest), + "batch" => return small::cmd_batch(rest), "replay" => return small::cmd_replay(rest), "gif" => return small::cmd_gif(rest), "qa" => return qa::cmd_qa(&args::parse(rest)), diff --git a/crates/cli/src/help.rs b/crates/cli/src/help.rs index c01162c..7b59f9f 100644 --- a/crates/cli/src/help.rs +++ b/crates/cli/src/help.rs @@ -6,11 +6,12 @@ pub const HELP: &str = r#"ghax — attach to your real Chrome/Edge via CDP and d Connection: attach [--port ] [--browser edge|chrome|chromium|brave|arc] [--launch] [--headless] [--load-extension ] [--data-dir ] - [--capture-bodies[=]] + [--capture-bodies[=]] [--verbose] # Without --port, scans :9222-9230. Multiple running → picker. # With --launch and no --port, auto-picks first free port in range. # --capture-bodies records JSON/text response bodies (opt-in, # 32KB cap per body). Glob filters by URL (e.g. '*/api/*'). + # --verbose prints pid/port/browser on success (default: silent). status [--json] detach restart @@ -22,7 +23,7 @@ Tab: new-window [url] # new background window, same profile goto back | forward | reload - eval + eval # auto-retries once past a nav-in-flight try [] [--css ] [--selector ] [--measure ] [--shot ] text html [] @@ -35,7 +36,9 @@ Snapshot & interact: upload <@ref|selector> [,…] # wraps setInputFiles press type - wait + wait # wait until selector appears (most common) + wait # fixed delay in milliseconds + wait --networkidle | --load # wait for a navigation event viewport responsive [prefix] [--fullPage] diff @@ -72,6 +75,8 @@ Real user gestures: Batch / recording: chain < steps.json (JSON array of {cmd, args?, opts?}) + batch '' (one round-trip; auto re-snapshots between + steps that use @e refs) record start [name] record stop record status diff --git a/crates/cli/src/rpc.rs b/crates/cli/src/rpc.rs index dc6d9a7..b72d6b6 100644 --- a/crates/cli/src/rpc.rs +++ b/crates/cli/src/rpc.rs @@ -29,8 +29,28 @@ impl std::fmt::Display for RpcError { impl std::error::Error for RpcError {} pub fn call(port: u16, cmd: &str, args: Value, opts: Value) -> Result { + // Single-retry shim for transient-looking errors — connection + // refused/reset, broken pipe, request build failure — so a daemon + // that's briefly unresponsive (post-spawn warm-up, GC pause, + // mid-reload) doesn't bubble up a user-visible failure. Semantic + // errors (daemon answered with ok:false) are NOT retried — those + // are real command failures, not flake. + match call_once(port, cmd, &args, &opts) { + Ok(v) => Ok(v), + Err(e) => { + if is_transient(&e) { + std::thread::sleep(std::time::Duration::from_millis(50)); + call_once(port, cmd, &args, &opts) + } else { + Err(e) + } + } + } +} + +fn call_once(port: u16, cmd: &str, args: &Value, opts: &Value) -> Result { let url = format!("http://127.0.0.1:{port}/rpc"); - let body = Request { cmd, args: &args, opts: &opts }; + let body = Request { cmd, args, opts }; let client = reqwest::blocking::Client::builder() // No global timeout: long verbs (qa, perf, snapshot with --wait) can run for minutes. .build()?; @@ -49,3 +69,18 @@ pub fn call(port: u16, cmd: &str, args: Value, opts: Value) -> Result { } Ok(envelope.get("data").cloned().unwrap_or(Value::Null)) } + +/// Transient = transport-layer hiccup we'd retry. A daemon-side semantic +/// failure (wrapped in `RpcError`) is never transient — it ran, it failed. +fn is_transient(err: &anyhow::Error) -> bool { + if err.downcast_ref::().is_some() { + return false; + } + if let Some(re) = err.downcast_ref::() { + // Connection refused / reset / broken pipe / timeout all look + // like the daemon blinked. `is_request` catches everything except + // a completed response. + return re.is_connect() || re.is_timeout() || re.is_request(); + } + false +} diff --git a/crates/cli/src/small.rs b/crates/cli/src/small.rs index 252786c..54f513b 100644 --- a/crates/cli/src/small.rs +++ b/crates/cli/src/small.rs @@ -75,6 +75,18 @@ pub fn cmd_status(rest: &[String]) -> Result { println!("attached {} ({})", daemon_state.browser_kind, browser_url_short); println!("daemon pid {}, port {}, up {}m", daemon_state.pid, daemon_state.port, up_min); println!("tabs {}", data.get("tabCount").and_then(|v| v.as_u64()).unwrap_or(0)); + // Surface the active tab so operators can sanity-check which page + // they're about to drive before issuing clicks / fills. Silently + // skipped if the daemon didn't send one (older daemon on new CLI). + if let Some(title) = data.get("activeTabTitle").and_then(|v| v.as_str()) { + let url = data.get("activeTabUrl").and_then(|v| v.as_str()).unwrap_or(""); + let id = data.get("activeTabId").and_then(|v| v.as_str()).unwrap_or(""); + if !id.is_empty() { + let title_trim = title.chars().take(60).collect::(); + let label = if title_trim.is_empty() { url.to_string() } else { title_trim }; + println!("active {} — {}", id, label); + } + } println!("targets {}", data.get("targetCount").and_then(|v| v.as_u64()).unwrap_or(0)); println!("extensions {}", data.get("extensionCount").and_then(|v| v.as_u64()).unwrap_or(0)); println!("cwd {}", daemon_state.cwd); @@ -416,6 +428,57 @@ pub fn cmd_chain(rest: &[String]) -> Result { Ok(if any_failed { EXIT_CDP_ERROR } else { EXIT_OK }) } +// ─── batch ─────────────────────────────────────────────────────────────────── + +/// `ghax batch ''` — one-round-trip sequence executor. +/// +/// Unlike `chain` (which reads stdin and does N round-trips), `batch` parses +/// the positional JSON argument client-side, ships the whole plan in one RPC, +/// and re-snapshots between steps that reference `@e` refs. That fixes +/// the mid-sequence ref-shift on framework-heavy forms where the ARIA tree +/// reindexes mid-plan. +pub fn cmd_batch(rest: &[String]) -> Result { + let parsed = args::parse(rest); + let Some(json_src) = parsed.positional.first() else { + eprintln!("Usage: ghax batch '[{{\"cmd\":\"click\",\"args\":[\"@e7\"]}}, …]'"); + return Ok(EXIT_USAGE); + }; + let steps: Value = match serde_json::from_str(json_src) { + Ok(v) => v, + Err(e) => { + eprintln!("ghax batch: invalid JSON — {e}"); + return Ok(EXIT_USAGE); + } + }; + if !matches!(&steps, Value::Array(_)) { + eprintln!("ghax batch: expected a top-level JSON array of {{cmd, args?, opts?}} steps"); + return Ok(EXIT_USAGE); + } + + let cfg = state::resolve_config(); + let port = match state::require_daemon(&cfg) { + Ok(p) => p, + Err(e) => { + eprintln!("ghax: {e}"); + return Ok(EXIT_NOT_ATTACHED); + } + }; + + // The daemon handler reads `args[0]` as the step array. + let args_payload = Value::Array(vec![steps]); + let data = rpc::call(port, "batch", args_payload, parsed.opts_without_json())?; + // `batch` results are always JSON — printing them any other way + // would defeat the machine-readability that motivates the verb. + output::print(&data, true); + + // Exit non-zero if any step failed, mirroring `chain`. + let any_failed = data + .as_array() + .map(|arr| arr.iter().any(|r| r.get("ok").and_then(|v| v.as_bool()) != Some(true))) + .unwrap_or(false); + Ok(if any_failed { EXIT_CDP_ERROR } else { EXIT_OK }) +} + // ─── replay ────────────────────────────────────────────────────────────────── /// `ghax replay ` — mirrors `cmdReplay`. diff --git a/crates/cli/src/state.rs b/crates/cli/src/state.rs index 1f25ad3..8833ea1 100644 --- a/crates/cli/src/state.rs +++ b/crates/cli/src/state.rs @@ -127,8 +127,15 @@ unsafe fn libc_kill(_pid: i32, _sig: i32) -> i32 { /// standard "attach first" message that mirrors cli.ts. pub fn require_daemon(cfg: &Config) -> Result { let state = read_state(cfg).ok_or_else(|| { + // When state is missing but a ghax daemon is already alive on the + // scan-range ports, tell the operator — `ghax attach` will pair + // with it instead of launching a new one, which is almost + // certainly what they want. + let hint = probe_live_daemon_ports() + .map(|p| format!(" (a ghax daemon is live on :{p} — `ghax attach` will pair with it)")) + .unwrap_or_default(); anyhow!( - "no daemon state at {} — run `ghax attach` first", + "no daemon state at {} — run `ghax attach` first{hint}", cfg.state_file.display() ) })?; @@ -137,8 +144,16 @@ pub fn require_daemon(cfg: &Config) -> Result { // pointing at a port now reused by a different ghax daemon (different // project, colliding port) would silently route RPCs to the wrong // browser session. - if health_check(state.port, state.pid).is_ok() { - return Ok(state.port); + match health_check(state.port, state.pid) { + Ok(()) => return Ok(state.port), + Err(e) if e.to_string().contains("stale state") => { + // Explicit stale-state path: different daemon answered on our + // port. Tell the user the exact fix. + return Err(anyhow!( + "{e} — run `ghax detach && ghax attach` to re-pair with the running browser" + )); + } + Err(_) => {} } if !is_process_alive(state.pid) { return Err(anyhow!( @@ -152,6 +167,29 @@ pub fn require_daemon(cfg: &Config) -> Result { )) } +/// Scan 9222..=9230 for a live ghax daemon `/health` — used only to enrich +/// the "no daemon state" error with an auto-reattach hint. Returns the +/// first responsive port, or None. +fn probe_live_daemon_ports() -> Option { + for port in 9222..=9230 { + let url = format!("http://127.0.0.1:{port}/health"); + let client = reqwest::blocking::Client::builder() + .timeout(std::time::Duration::from_millis(200)) + .build() + .ok()?; + if let Ok(resp) = client.get(&url).send() { + if resp.status().is_success() { + if let Ok(body) = resp.json::() { + if body.get("ok").and_then(|v| v.as_bool()) == Some(true) { + return Some(port); + } + } + } + } + } + None +} + fn health_check(port: u16, expected_pid: i32) -> Result<()> { let url = format!("http://127.0.0.1:{port}/health"); let client = reqwest::blocking::Client::builder() diff --git a/src/daemon.ts b/src/daemon.ts index 2ab54e1..fb8740d 100644 --- a/src/daemon.ts +++ b/src/daemon.ts @@ -121,6 +121,26 @@ async function pageTargetId(page: Page): Promise { } } +// Retry once past a navigation-in-flight. Playwright's `page.evaluate` +// throws `Execution context was destroyed` / `Target closed` if the +// active frame navigates mid-call; the pragmatic fix is to wait for +// the next load state and retry once. Matches what a human would do +// manually — `wait --load && eval …`. +function isNavTransient(err: unknown): boolean { + const msg = String((err as { message?: string } | null)?.message ?? ''); + return /Execution context was destroyed|Target closed|frame was detached|Navigation failed because/i.test(msg); +} + +async function evalWithNavRetry(page: Page, js: string, maxWaitMs = 3000): Promise { + try { + return await page.evaluate(js); + } catch (err) { + if (!isNavTransient(err)) throw err; + await page.waitForLoadState('load', { timeout: maxWaitMs }).catch(() => {}); + return await page.evaluate(js); + } +} + async function activePage(ctx: Ctx): Promise { const pages = await allPages(ctx); if (pages.length === 0) throw new Error('No tabs open in attached browser.'); @@ -293,11 +313,95 @@ function resolveRef(ctx: Ctx, target: string, page: Page): Locator { // ─── Command handlers ────────────────────────────────────────── +// Batch — execute N steps in a single daemon round-trip. Between steps +// that reference `@e` refs, re-snapshot automatically so the ref +// map resolves against the *current* DOM. That's the core fix for +// JNR-03: mid-click-sequence ARIA shifts (Material / React comboboxes +// opening and reindexing) used to silently mis-resolve refs under +// `ghax chain`. Auto-snapshot can be disabled with `--no-auto-snapshot` +// for callers that want strict one-shot semantics. +register('batch', async (ctx, args, opts) => { + const steps = Array.isArray(args[0]) ? (args[0] as unknown[]) : null; + if (!steps || steps.length === 0) { + throw new Error('Usage: batch \'[{"cmd":"click","args":["@e7"]}, …]\''); + } + const stopOnError = opts.stopOnError !== false; + const autoSnapshot = opts['auto-snapshot'] !== false && opts.autoSnapshot !== false; + const snapshotHandler = handlers.get('snapshot'); + const results: Array> = []; + + const usesRef = (step: { args?: unknown[]; opts?: Record }) => { + const inArgs = Array.isArray(step.args) + ? step.args.some((v) => typeof v === 'string' && v.startsWith('@e')) + : false; + const inOpts = step.opts + ? Object.values(step.opts).some((v) => typeof v === 'string' && v.startsWith('@e')) + : false; + return inArgs || inOpts; + }; + + for (const raw of steps) { + if (!raw || typeof raw !== 'object') { + results.push({ cmd: '', ok: false, error: 'step must be an object' }); + if (stopOnError) break; + continue; + } + const step = raw as { cmd?: unknown; args?: unknown; opts?: unknown }; + const cmd = typeof step.cmd === 'string' ? step.cmd : null; + if (!cmd) { + results.push({ cmd: '', ok: false, error: 'step missing cmd' }); + if (stopOnError) break; + continue; + } + const stepArgs = Array.isArray(step.args) ? (step.args as unknown[]) : []; + const stepOpts = (step.opts && typeof step.opts === 'object') ? (step.opts as Record) : {}; + const handler = handlers.get(cmd); + if (!handler) { + results.push({ cmd, ok: false, error: `unknown cmd: ${cmd}` }); + if (stopOnError) break; + continue; + } + // Refresh the ref map before any step that uses `@e` — so the + // caller doesn't have to interleave manual snapshots. + if (autoSnapshot && snapshotHandler && usesRef({ args: stepArgs, opts: stepOpts })) { + try { + await snapshotHandler(ctx, [], { interactive: true }); + } catch { + // A snapshot failure is informational — the step itself will + // surface the concrete "ref not found" error if it's still bad. + } + } + try { + const data = await handler(ctx, stepArgs, stepOpts); + results.push({ cmd, ok: true, data }); + } catch (err) { + results.push({ cmd, ok: false, error: String((err as { message?: string } | null)?.message ?? err) }); + if (stopOnError) break; + } + } + return results; +}); + register('status', async (ctx) => { const pages = await allPages(ctx); const targets = await ctx.pool.list(); const extIds = new Set(); for (const t of targets) if (t.extensionId) extIds.add(t.extensionId); + // Surface the active tab's id + title so `ghax status` can tell operators + // which tab they're about to drive — matters most in multi-agent sessions + // where `new-window` has parked the agent on a non-obvious tab. + let activeTabTitle = ''; + let activeTabUrl = ''; + if (ctx.activePageId) { + const entries = await Promise.all( + pages.map(async (p) => [await pageTargetId(p), p] as const), + ); + const active = entries.find(([id]) => id === ctx.activePageId)?.[1]; + if (active) { + activeTabTitle = await active.title().catch(() => ''); + activeTabUrl = active.url(); + } + } return { pid: process.pid, uptimeMs: Date.now() - ctx.startedAt, @@ -306,6 +410,9 @@ register('status', async (ctx) => { tabCount: pages.length, targetCount: targets.length, extensionCount: extIds.size, + activeTabId: ctx.activePageId ?? null, + activeTabTitle, + activeTabUrl, }; }); @@ -465,7 +572,10 @@ register('eval', async (ctx, args, opts) => { const js = String(args[0] ?? ''); if (!js) throw new Error('Usage: eval '); const page = await activePage(ctx); - const result = await page.evaluate(js); + // Navigation in flight when eval lands will destroy the execution + // context mid-call. Wait for the next load state once and retry before + // giving up — matches what a human would do manually. + const result = await evalWithNavRetry(page, js); // --max-bytes caps the stringified result so an accidental // `document.body.innerText` on a heavy page can't blow out the // LLM operator's context window. Measured in UTF-8 bytes, not @@ -572,7 +682,11 @@ register('text', async (ctx, _args, opts) => { if (selector) { text = await page.locator(selector).first().innerText(); } else { - text = await page.evaluate(() => document.body.innerText); + // Same nav-in-flight retry as `eval` — heavy pages with still-running + // XHR-driven navigation will trip `Execution context was destroyed` + // the first time around; waiting for the next load event and + // retrying once rescues the operator from a spurious failure. + text = (await evalWithNavRetry(page, 'document.body.innerText')) as string; } if (skip > 0 || length !== null) { const end = length !== null ? skip + length : undefined; @@ -679,6 +793,9 @@ register('snapshot', async (ctx, _args, opts) => { depth: opts.depth === undefined ? undefined : Number(opts.depth), selector: opts.selector as string | undefined, cursorInteractive: Boolean(opts.cursorInteractive), + // Default-on dialog scoping; callers opt out with --no-dialog-scope, + // which the arg parser surfaces as `no-dialog-scope: true`. + dialogScope: !(opts['no-dialog-scope'] || opts.noDialogScope), }); ctx.refs = result.refs; @@ -779,18 +896,39 @@ register('fill', async (ctx, args) => { if (!target) throw new Error('Usage: fill <@ref|selector> '); const page = await activePage(ctx); const loc = resolveRef(ctx, target, page); - // React-safe path: set the value via the native setter and dispatch - // an 'input' event, so React's synthetic-event bookkeeping updates - // its internal state (plain page.fill() triggers the controlled-input - // "input value mismatch" bug on some code). + // Framework-safe path. React, Angular, and Material each intercept + // `value` assignment in a way plain `locator.fill()` doesn't reach: + // - React: tracks the value on a hidden internal property; the + // native setter bypasses React's wrapper so a subsequent + // 'input' event refreshes its synthetic-event bookkeeping. + // - Angular: binds via `(input)`/`(change)` but most validators only + // run on 'blur' — we dispatch one at the end. + // - Material: often wraps the real inside a host component + // with `contenteditable` spans; falls through to the + // textContent path when there's no native value setter. await loc.evaluate((el, v) => { - const e = el as HTMLInputElement | HTMLTextAreaElement; - const proto = Object.getPrototypeOf(e); + const e = el as HTMLElement; + // contenteditable path — Material's mat-chip / rich editors land here. + if (e.getAttribute('contenteditable') === 'true') { + e.focus(); + e.textContent = v; + e.dispatchEvent(new InputEvent('input', { bubbles: true, inputType: 'insertText', data: v })); + e.dispatchEvent(new Event('change', { bubbles: true })); + e.dispatchEvent(new FocusEvent('blur', { bubbles: true })); + return; + } + const input = e as HTMLInputElement | HTMLTextAreaElement; + const proto = Object.getPrototypeOf(input); const setter = Object.getOwnPropertyDescriptor(proto, 'value')?.set; - if (setter) setter.call(e, v); - else (e as any).value = v; - e.dispatchEvent(new Event('input', { bubbles: true })); - e.dispatchEvent(new Event('change', { bubbles: true })); + input.focus(); + if (setter) setter.call(input, v); + else (input as unknown as { value: string }).value = v; + input.dispatchEvent(new Event('input', { bubbles: true })); + input.dispatchEvent(new Event('change', { bubbles: true })); + // Blur triggers Angular's `FormControl.markAsTouched` and most + // pristine/dirty-based validators. Most sites no-op if the focus + // never moved, so dispatching an explicit blur is safe. + input.dispatchEvent(new FocusEvent('blur', { bubbles: true })); }, value); return { ok: true }; }); diff --git a/src/snapshot.ts b/src/snapshot.ts index f7385cc..acf5cb4 100644 --- a/src/snapshot.ts +++ b/src/snapshot.ts @@ -29,6 +29,14 @@ export interface SnapshotOptions { depth?: number; selector?: string; cursorInteractive?: boolean; + /** + * When true (default) and no explicit --selector was passed, auto-scope + * the snapshot to an open modal dialog if one is visible. Outer app is + * usually `aria-hidden="true"` while a modal is up, which means walking + * from `body` yields an empty-ish tree and every captured ref lives on + * a hidden ancestor. Pass `--no-dialog-scope` to force body. + */ + dialogScope?: boolean; } export interface SnapshotResult { @@ -68,10 +76,23 @@ export async function snapshot( target: Page | Frame, opts: SnapshotOptions = {}, ): Promise { - const rootLocator = opts.selector ? target.locator(opts.selector) : target.locator('body'); + let rootLocator = opts.selector ? target.locator(opts.selector) : target.locator('body'); if (opts.selector) { const count = await rootLocator.count(); if (count === 0) throw new Error(`Selector not found: ${opts.selector}`); + } else if (opts.dialogScope !== false) { + // Dialog-aware walker — if a modal is open, walk from it instead of + // from `body`. Covers `[role=dialog]`, `[role=alertdialog]`, the + // native ``, and ad-hoc `[aria-modal=true]` scrims + // (Radix, Headless UI, Material). `.last()` picks the top-most + // modal if a stack is open. The `:visible` pseudo filters out + // detached / display:none dialogs that some frameworks leave in + // the DOM between openings. + const modalSel = '[role=dialog]:visible, [role=alertdialog]:visible, dialog[open]:visible, [aria-modal="true"]:visible'; + const modal = target.locator(modalSel).last(); + if ((await modal.count()) > 0) { + rootLocator = modal; + } } const ariaText = await rootLocator.ariaSnapshot(); diff --git a/test/smoke.ts b/test/smoke.ts index 7e598e9..ae8d9d0 100644 --- a/test/smoke.ts +++ b/test/smoke.ts @@ -90,8 +90,11 @@ const c = (name: string, fn: () => Promise) => checks.push({ name, fn }); c('attach is idempotent (first call attaches)', async () => { const r = await run(['attach']); + // POSIX convention: fresh attach is silent on success (since TOK-07). + // The idempotent re-attach still prints `already attached` because + // that's informational — you asked to attach, we didn't. assert( - /attached/.test(r.stdout) || /already attached/.test(r.stdout), + r.stdout.trim() === '' || /attached/.test(r.stdout) || /already attached/.test(r.stdout), `unexpected attach output: ${r.stdout}`, ); }); @@ -104,7 +107,17 @@ c('attach is idempotent (second call reuses)', async () => { c('status --json has expected shape', async () => { const r = await run(['status', '--json']); const s = parseJson>(r.stdout); - for (const key of ['pid', 'port', 'browserKind', 'tabCount', 'targetCount', 'extensionCount']) { + for (const key of [ + 'pid', + 'port', + 'browserKind', + 'tabCount', + 'targetCount', + 'extensionCount', + 'activeTabId', + 'activeTabTitle', + 'activeTabUrl', + ]) { assert(key in s, `status missing ${key}`); } }); @@ -335,6 +348,21 @@ c('chain executes multiple steps', async () => { assert(results.every((s) => s.ok), `chain had failures: ${JSON.stringify(results)}`); }); +c('batch runs a step sequence in one round-trip', async () => { + const steps = JSON.stringify([ + { cmd: 'goto', args: ['https://example.com'] }, + { cmd: 'wait', args: ['200'] }, + { cmd: 'text' }, + ]); + const r = await run(['batch', steps]); + const results = parseJson>(r.stdout); + assert(results.length === 3, `expected 3 results, got ${results.length}`); + assert(results.every((s) => s.ok), `batch had failures: ${JSON.stringify(results)}`); + const textStep = results[2]; + assert(typeof textStep.data === 'string' && (textStep.data as string).toLowerCase().includes('example'), + `expected example.com text, got ${JSON.stringify(textStep.data).slice(0, 100)}`); +}); + c('record + replay round-trips', async () => { const name = `smoke-rec-${Date.now()}`; await run(['record', 'start', name]);