Skip to content
Merged
6 changes: 6 additions & 0 deletions .task-memory.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
{
"planning_dir": "docs/planning",
"task_prefix": "TASK",
"min_engagements_to_block": 3,
"session_state_max_age_hours": 24
}
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,28 @@ Format inspired by [Keep a Changelog](https://keepachangelog.com/en/1.1.0/).
RPC just to discard them locally. `console` also accepts `errors:
true` via RPC opts so the daemon drops non-error levels before
serialising; the Rust CLI uses this on the hot path.
- **Bucket A payload-reduction sprint** — six flags and one new verb
to cut context cost for LLM operators driving ghax (sourced from
the 2026-04-20 jnremache field report):
- `screenshot --full-page` — kebab-case alias for the v0.1
`--fullPage`, matching the rest of the CLI convention. Both
forms accepted.
- `tabs --filter <regex> --fields <csv>` — server-side regex
filter (case-insensitive, matched against url + title) and
field projection (id, title, url, active). Cuts ~200 bytes per
google-product tab.
- `eval --max-bytes <N>` — caps the stringified eval result at
N utf-8 bytes. On trip, returns `{value, truncated: true,
originalBytes}`; when under cap the shape is unchanged.
- `text --selector <sel> --length <N> --skip <M>` — scoped,
paged page-text dumps. Replaces hand-rolled
`document.body.innerText.substring(...)`.
- `upload <@ref|selector> <path>[,<path>…]` — first-class file
upload verb wrapping Playwright's `locator.setInputFiles`.
Comma-separated paths trigger multi-file mode.
- `snapshot --compact` now suppresses the cursor-interactive
pass when paired with `-i`. Explicit `-C` still forces it on.
Large SPAs shrink measurably in compact mode.

### Fixed
- **Invariant enforcement**: `ctx.refs` is now cleared when the active
Expand Down
39 changes: 39 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -233,3 +233,42 @@ but "why is it this way" documentation.
If a test fails that you can't explain, check the "Daemon restart
required" invariant above. It's the single most common source of
confusion.

## Task Memory Integration

This project uses task-memory for context-preserving task management.

**Planning Location:** docs/planning/tasks.md
**Notes Location:** docs/planning/notes/TASK-XXX.md (auto-created on SessionStart for every in-progress task)

### Session Start Protocol

At the start of EVERY session:
1. The SessionStart hook auto-displays current task + notes summary.
2. If you see "⚠️ CONTEXT GAP DETECTED", recreate findings from the operations log BEFORE coding.
3. For full verification, run `/task-status` — it computes a Context Health Score (0-5).

### Task vs. Question Triage

- **TASK** (implement, fix, build, refactor, migrate): Create task in docs/planning/tasks.md FIRST, then work.
- **QUESTION** (what, how, why, explain): Answer directly — no task needed.
- **AMBIGUOUS** ("help me with X", "I'm stuck"): Ask one clarifying question before deciding.

### Context Preservation Protocol

The hook handles these automatically — you don't need to remember:
- Logs every WebFetch/WebSearch with response snippet to `**Visual Operations Log**`
- Auto-creates `docs/planning/notes/TASK-XXX.md` skeleton on SessionStart for every in-progress task
- Saves pre-compaction snapshot and appends ops log to notes file at PreCompact
- Blocks Stop if research ops ≥ 2 and notes file is empty/skeleton

What YOU must still do:
- Fill in Patterns / Gotchas / Decisions in the notes file (hook creates, you synthesize)
- Check off subtasks as you complete them
- Run self-critique before marking Status: done

### Commit Convention

Reference the task ID in every commit: `feat: description (TASK-XXX)`

**Skills:** `/tm-init` (setup) | `/task-memory` (full workflow) | `/task-status` (5-question + health score)
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -190,25 +190,31 @@ attach [--port N] [--browser edge|chrome|chromium|brave|arc] [--launch]
status [--json]
detach
restart
tabs
tabs [--filter <regex>] [--fields <csv>]
# --filter: case-insensitive regex on url+title
# --fields: id,title,url,active — project only these
tab <id> [--quiet] # --quiet = don't bringToFront (agent mode)
find <url-substring> # list matching tabs (pipe into `tab`)
new-window [url] # new background window, same profile
goto <url>
back | forward | reload
eval <js>
eval <js> [--max-bytes N] # --max-bytes caps result size to N utf-8 bytes
try [<js>] [--css <rules>] [--selector <sel>] [--measure <expr>] [--shot <path>]
text
text [--selector <sel>] [--length N] [--skip M]
# scoped, paged page-text dumps
html [<selector>]
screenshot [<@ref|selector>] [--path p] [--fullPage]
screenshot [<@ref|selector>] [--path p] [--full-page]
snapshot [-i] [-c] [-d N] [-s <sel>] [-C] [-a] [-o <path>]
# --compact (-c) also suppresses cursor-interactive
# when combined with -i; use -C to force it on
click <@ref|selector>
fill <@ref|selector> <value>
upload <@ref|selector> <path>[,<path>…] # wraps setInputFiles
press <key>
type <text>
wait <selector|ms|--networkidle|--load>
viewport <WxH>
responsive [prefix] [--fullPage]
responsive [prefix] [--full-page]
diff <url1> <url2>
is <visible|hidden|enabled|disabled|checked|editable> <@ref|selector>
xpath <expression> [--limit N] # list matching elements with text + box
Expand Down
9 changes: 9 additions & 0 deletions crates/cli/src/dispatch.rs
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,15 @@ fn dispatch_inner(cfg: &Config, verb: &str, rest: &[String]) -> Result<i32> {
simple(cfg, "fill", parsed)
}

"upload" => {
let parsed = args::parse(rest);
if parsed.positional.len() < 2 {
eprintln!("Usage: ghax upload <@ref|selector> <path>[,<path>…]");
return Ok(EXIT_USAGE);
}
simple(cfg, "upload", parsed)
}

"is" => {
let parsed = args::parse(rest);
let port = state::require_daemon(cfg)?;
Expand Down
3 changes: 2 additions & 1 deletion crates/cli/src/help.rs
Original file line number Diff line number Diff line change
Expand Up @@ -26,12 +26,13 @@ Tab:
try [<js>] [--css <rules>] [--selector <sel>] [--measure <expr>] [--shot <path>]
text
html [<selector>]
screenshot [<@ref|selector>] [--path <p>] [--fullPage]
screenshot [<@ref|selector>] [--path <p>] [--full-page]

Snapshot & interact:
snapshot [-i] [-c] [-d <N>] [-s <sel>] [-C] [-a] [-o <path>]
click <@ref|selector>
fill <@ref|selector> <value>
upload <@ref|selector> <path>[,<path>…] # wraps setInputFiles
press <key>
type <text>
wait <selector|ms|--networkidle|--load>
Expand Down
5 changes: 5 additions & 0 deletions docs/planning/archive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
# Archived Tasks

Tasks that have been completed and archived.

---
57 changes: 57 additions & 0 deletions docs/planning/sprint-bucket-a.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Sprint: Bucket A — payload reduction + first-class upload

## Goal

Ship the six "high-ROI, same theme as item 10" items from the
2026-04-20 jnremache field report (see `plan.md` follow-up sprint
section, Bucket A). Every item cuts payload size sent to an LLM
operator, or removes a papercut the operator hand-rolled in
JavaScript during the field session.

All six are narrow, backend-facing changes: new daemon options +
thin CLI wiring. Each is verifiable with one smoke assertion.

## Tasks

1. [x] `screenshot --full-page` kebab alias (GHAX-FR-06). Currently
only `--fullPage` works; the kebab form is the convention
everywhere else in the CLI. Add the alias in the daemon's
`screenshot` handler. Trivial.
2. [x] `tabs --filter <regex> --fields <csv>` (TOK-04). Server-side
regex filter on URL + title, field projection on the returned
objects. Cuts ~200 bytes per google-product tab when filtering.
3. [x] `eval --max-bytes <N>` (TOK-02). Server-side truncation on
the stringified result. Protects LLM operators from accidental
context blow-outs. Returns `{value, truncated: true, originalBytes}`
when it trips.
4. [x] `text --selector <sel> --length <N> --skip <M>` (TOK-10).
Scoped, paged page-text dumps. Replaces hand-rolled
`document.body.innerText.substring(...)`.
5. [x] `upload <@ref|selector> <path>` (JNR-07). First-class file
upload verb wrapping Playwright's `locator.setInputFiles`. Used
5x in the field session via a hand-written shim.
6. [x] `snapshot --compact` suppresses cursor-interactive pass
(TOK-01). Today `--compact` only drops noise nodes from the
ARIA tree; the cursor-interactive section still runs whenever
`-i` is set and dominates the output on heavy SPAs. Gate the
cursor pass on `!opts.compact` so `-i --compact` gives the
interactive tree without the cursor bloat.

## Acceptance criteria

- Every new flag has a smoke check in `test/smoke.ts`.
- `npm run typecheck`, `npm run build`, `cargo build --release`,
and `npm run test:smoke` all green against the Rust binary
(`GHAX_BIN=$PWD/target/release/ghax npm run test:smoke`).
- `CHANGELOG.md` under `[Unreleased]` lists all six items.
- `README.md` command surface mentions each new flag.
- No new runtime deps (zero — all implementations are Playwright
features already in the dep tree).

## Deferred

(Populated during the run if items slip scope.)

## Queued decisions

(Empty at plan time.)
32 changes: 32 additions & 0 deletions docs/planning/tasks.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Task Board

<!-- Config: Last Task ID: 6 -->

## ⚙️ Configuration

**Columns**: 📝 To Do (todo) | 🚀 In Progress (in-progress) | 👀 In Review (in-review) | ✅ Done (done)

**Categories**: Frontend, Backend, Design, DevOps, Tests, Documentation

**Users**: @user (User)

**Priorities**: 🔴 Critical | 🟠 High | 🟡 Medium | 🟢 Low

**Tags**: #bug #feature #ui #backend #urgent #refactor #docs #test

---

## 📝 To Do

## 🚀 In Progress

## 👀 In Review

## ✅ Done

- **TASK-001**: `screenshot --full-page` kebab alias ✓
- **TASK-002**: `tabs --filter --fields` ✓
- **TASK-003**: `eval --max-bytes` truncation ✓
- **TASK-004**: `text --selector --length --skip` ✓
- **TASK-005**: `upload` verb ✓
- **TASK-006**: `snapshot --compact` suppresses cursor pass ✓
97 changes: 90 additions & 7 deletions src/daemon.ts
Original file line number Diff line number Diff line change
Expand Up @@ -309,14 +309,39 @@ register('status', async (ctx) => {
};
});

register('tabs', async (ctx) => {
register('tabs', async (ctx, _args, opts) => {
const pages = await allPages(ctx);
return Promise.all(
const filterStr = (opts.filter as string | undefined) ?? null;
let filterRe: RegExp | null = null;
if (filterStr) {
try {
filterRe = new RegExp(filterStr, 'i');
} catch (err: any) {
throw new Error(`tabs --filter: invalid regex: ${err?.message || filterStr}`);
}
}
// --fields accepts a csv list of keys to keep. Valid keys: id, title,
// url, active. Invalid keys are ignored silently so a typo can't kill
// the whole command mid-session. Omitted → return every field.
const fieldsArg = (opts.fields as string | undefined) ?? null;
const fields: Set<string> | null = fieldsArg
? new Set(fieldsArg.split(',').map((s) => s.trim()).filter(Boolean))
: null;
const all = await Promise.all(
pages.map(async (p) => {
const [id, title] = await Promise.all([pageTargetId(p), p.title().catch(() => '')]);
return { id, title, url: p.url(), active: id === ctx.activePageId };
}),
);
const matched = filterRe
? all.filter((t) => filterRe!.test(t.url) || filterRe!.test(t.title))
: all;
if (!fields) return matched;
return matched.map((t) => {
const out: Record<string, unknown> = {};
for (const k of fields) if (k in t) out[k] = (t as any)[k];
return out;
});
});

register('tab', async (ctx, args, opts) => {
Expand Down Expand Up @@ -436,11 +461,29 @@ register('reload', async (ctx) => {
return { url: page.url() };
});

register('eval', async (ctx, args) => {
register('eval', async (ctx, args, opts) => {
const js = String(args[0] ?? '');
if (!js) throw new Error('Usage: eval <js>');
const page = await activePage(ctx);
const result = await page.evaluate(js);
// --max-bytes caps the stringified result so an accidental
// `document.body.innerText` on a heavy page can't blow out the
// LLM operator's context window. Measured in UTF-8 bytes, not
// characters. When it trips we wrap the response so the caller
// can see what happened; when it doesn't trip we return the
// value unchanged (zero shape change for scripts that already
// expect the raw value).
const maxBytesRaw = opts['max-bytes'] ?? opts.maxBytes;
const maxBytes = maxBytesRaw !== undefined ? Number(maxBytesRaw) : null;
if (maxBytes !== null && Number.isFinite(maxBytes) && maxBytes > 0) {
const serialized = typeof result === 'string' ? result : JSON.stringify(result) ?? '';
const bytes = Buffer.byteLength(serialized, 'utf8');
if (bytes > maxBytes) {
// Slice in bytes — Buffer handles multi-byte UTF-8 correctly.
const truncated = Buffer.from(serialized, 'utf8').subarray(0, maxBytes).toString('utf8');
return { value: truncated, truncated: true, originalBytes: bytes };
}
}
return result;
});

Expand Down Expand Up @@ -513,9 +556,28 @@ register('try', async (ctx, args, opts) => {
return { value, ...(shot ? { shot } : {}) };
});

register('text', async (ctx) => {
register('text', async (ctx, _args, opts) => {
const page = await activePage(ctx);
const text = await page.evaluate(() => document.body.innerText);
const selector = (opts.selector as string | undefined) ?? null;
// --skip/--length paginate the returned string. The daemon still
// pulls full innerText — the win is on the wire, which is where
// the operator's context budget lives. Pagination uses code-unit
// offsets to match JavaScript's substring semantics; if/when a
// field report complains about emoji-splitting we'll switch to
// grapheme segmentation.
const skip = opts.skip !== undefined ? Math.max(0, Number(opts.skip)) : 0;
const lengthRaw = opts.length !== undefined ? Number(opts.length) : null;
const length = lengthRaw !== null && Number.isFinite(lengthRaw) && lengthRaw > 0 ? lengthRaw : null;
let text: string;
if (selector) {
text = await page.locator(selector).first().innerText();
} else {
text = await page.evaluate(() => document.body.innerText);
}
if (skip > 0 || length !== null) {
const end = length !== null ? skip + length : undefined;
text = text.slice(skip, end);
}
return text;
});

Expand All @@ -530,10 +592,14 @@ register('screenshot', async (ctx, args, opts) => {
const page = await activePage(ctx);
const outPath = (opts.path as string) || `/tmp/ghax-shot-${Date.now()}.png`;
const target = args[0] ? String(args[0]) : null;
// Accept both `--fullPage` (v0.1 camelCase) and `--full-page` (kebab,
// matches every other CLI flag). Kebab is the preferred form going
// forward; camelCase stays for back-compat with live scripts.
const fullPage = Boolean(opts.fullPage || opts['full-page']);
if (target) {
await resolveRef(ctx, target, page).screenshot({ path: outPath });
} else {
await page.screenshot({ path: outPath, fullPage: Boolean(opts.fullPage) });
await page.screenshot({ path: outPath, fullPage });
}
return { path: outPath };
});
Expand Down Expand Up @@ -737,6 +803,23 @@ register('press', async (ctx, args) => {
return { ok: true };
});

// ─── upload — first-class file upload via setInputFiles ────────
//
// Wraps Playwright's locator.setInputFiles so operators don't have to
// hand-roll the DOM.setFileInputFiles CDP call every time. Accepts a
// single path or a comma-separated list for multi-file inputs.
// Paths are resolved relative to the daemon's cwd (captured at attach).
register('upload', async (ctx, args) => {
const target = String(args[0] ?? '');
const pathArg = String(args[1] ?? '');
if (!target || !pathArg) throw new Error('Usage: upload <@ref|selector> <path>[,<path>…]');
const page = await activePage(ctx);
const loc = resolveRef(ctx, target, page);
const paths = pathArg.split(',').map((p) => p.trim()).filter(Boolean);
await loc.setInputFiles(paths.length === 1 ? paths[0] : paths);
return { ok: true, count: paths.length };
});

register('type', async (ctx, args) => {
const text = String(args[0] ?? '');
const page = await activePage(ctx);
Expand Down Expand Up @@ -999,7 +1082,7 @@ register('responsive', async (ctx, args, opts) => {
// Let layout settle — some CSS grid + responsive components need a paint.
await page.waitForTimeout(200);
const outPath = `${prefix}-${preset.name}.png`;
await page.screenshot({ path: outPath, fullPage: Boolean(opts.fullPage) });
await page.screenshot({ path: outPath, fullPage: Boolean(opts.fullPage || opts['full-page']) });
results.push({ ...preset, path: outPath });
}
} finally {
Expand Down
Loading
Loading