diff --git a/AGENTS.md b/AGENTS.md index 797fc51..fc40374 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -13,7 +13,7 @@ This file is for implementation operators. See `skills/webster-lp-audit/SKILL.md Two active workstreams: -- **Production Webster** — Nicolette's weekly landing-page improvement council runs on `main` via `prompts/second-wbs-session.md`. This is live for her business; do not break it. +- **Production Webster** — Nicolette's weekly landing-page improvement council runs on `main`. Operator surface: `/webster-weekly-council` (skill at `skills/webster-weekly-council/SKILL.md`) or the single-page runbook at `prompts/second-wbs-session.md`. Both produce identical artifacts; the prompt is the locked source-of-truth runbook. This is live for her business; do not break it. - **Hackathon expansion** — Single-substrate Richer Health LP demo with a simulation runner producing timelapse assets. Deadline **2026-04-28**. Working branch: `dev/`. See `context/VISION.md` for canonical north-star. ## First actions every session @@ -126,12 +126,13 @@ Use `TaskCreate` / `TaskUpdate` for multi-step work within a single session. Tas ## Skill invocation (Claude Code) -Webster ships two runtime-critic skills: +Webster ships these skills: -- `skills/webster-lp-audit/SKILL.md` — shared council run flow (referenced by production critics) -- `skills/webster-onboarding/SKILL.md` — end-user onboarding flow (universal, demo placeholder) +- `skills/webster-weekly-council/SKILL.md` — operator surface for the weekly run. Library skill: SKILL.md index + on-demand phase references + helper scripts. Slash-command form: `/webster-weekly-council`. Equivalent single-page runbook at `prompts/second-wbs-session.md`. +- `skills/webster-lp-audit/SKILL.md` — shared council run discipline (referenced by production critics) +- `skills/webster-browser-audit/SKILL.md` — headless browser audit for visual review -If your work modifies either skill, test with a sample invocation before committing. +If your work modifies any skill, test with a sample invocation before committing. The weekly-council skill must stay artifact-equivalent with `prompts/second-wbs-session.md` — when in doubt, fix the skill, never the prompt. ## Parallel stream etiquette diff --git a/README.md b/README.md index d38dced..72a719e 100644 --- a/README.md +++ b/README.md @@ -79,7 +79,7 @@ bun scripts/critic-genealogy.ts --fixtures scripts/__tests__/fixtures/genealogy **30-second pitch:** Webster is an autonomous landing-page improvement council. Nine Claude Managed Agents plan, audit, monitor, synthesize, and package one weekly redesign proposal; the standout demo is Critic Genealogy, where Opus 4.7 detects an unowned audit gap and registers a new specialist at runtime. -**Live-run evidence:** the full operator path is [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md), registration IDs live in `environments/webster-council-env.id` and `context/*/id.txt`, and run artifacts are written under `history//` when the weekly prompt is executed. +**Live-run evidence:** the operator surface is the [`/webster-weekly-council`](skills/webster-weekly-council/SKILL.md) skill (library: SKILL.md index + on-demand phase references + helper scripts); the full single-page runbook lives at [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md). Registration IDs live in `environments/webster-council-env.id` and `context/*/id.txt`. Run artifacts are written under `history//` when the weekly run executes. **Demo arc artifacts:** the hackathon timelapse animates an 11-week simulation council run. Per-week deliverables live under [`demo-output/landing-page/`](demo-output/landing-page/) (`w00..w10`): desktop/mobile/tablet screenshots, heatmap JSON+SVG, synthetic analytics, and the visual reviewer's markdown verdict. Anthropic Managed Agents memory-store provisioning is captured at [`assets/memory-stores-screenshots/`](assets/memory-stores-screenshots/). The rendered timelapse is hosted externally (link in the submission form); reproduce locally with `bun skills/webster-video/scripts/hydrate-demo-assets.ts && cd video && npx hyperframes render -q high --strict`. @@ -96,9 +96,10 @@ webster/ │ └── simulation/ 9 LP-sim specs (1:1 mirror) that drive the timelapse demo ├── context/ architecture, features, quality gates, per-critic findings dirs ├── environments/ webster-council-env.json (single Anthropic environment) -├── prompts/ first-wbs-session.md (bootstrap), second-wbs-session.md (weekly run) +├── prompts/ first-wbs-session.md (bootstrap), second-wbs-session.md (weekly run runbook) ├── scripts/ validate-agents, validate-findings, critic-genealogy -├── skills/ webster-lp-audit (shared critic discipline), webster-onboarding +├── skills/ webster-weekly-council (operator surface for the weekly run), +│ webster-lp-audit (shared critic discipline) ├── .github/workflows/ CI: type + lint + format + schema + findings + markdown + tests ├── .husky/ pre-commit runs the same gates locally └── AGENTS.md operator guide for in-repo work @@ -106,7 +107,7 @@ webster/ ## The weekly flow -The live council runner is a bash-in-markdown prompt: [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md). It: +The live council runner is a Claude Code library skill: [`/webster-weekly-council`](skills/webster-weekly-council/SKILL.md) — slim SKILL.md index, on-demand phase references under `references/`, and reusable helper scripts under `scripts/`. The single-page bash-in-markdown runbook at [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md) is the same flow as a scrollable readable page. Both produce identical artifacts. The flow: 1. Seeds 10 weeks of mock analytics on first run (monitor needs baselines to diff). 2. Prepares a shared `council/YYYY-MM-DD` branch. @@ -169,11 +170,19 @@ Registers the single environment + 9 production agents against the Anthropic API ### Weekly council run +In Claude Code (primary): + +```text +/webster-weekly-council +``` + +Or as a single-page prompt (fallback): + ```bash wbs @prompts/second-wbs-session.md ``` -Runs the full planner + fan-out + redesigner + draft PR described above. +Both run the full planner + fan-out + redesigner + draft PR described above. The skill loads phase references on demand (smaller per-turn context budget); the prompt is one readable file. ### Spawn a genealogy critic manually diff --git a/context/ARCHITECTURE.md b/context/ARCHITECTURE.md index 2d9be14..0493ab8 100644 --- a/context/ARCHITECTURE.md +++ b/context/ARCHITECTURE.md @@ -2,7 +2,7 @@ > Mirrors [[webster-architecture]] in vault. Canonical source is this file for in-repo operators; vault file for cross-session memory. > -> **Submission state**: Layers 1–4 + Layer 7 shipped. Layer 5 (`site/` fork + analytics pixel + `scripts/seed-mock-history.ts`) is scoped out for submission — the mock seeder is inlined in `prompts/second-wbs-session.md` Step 1 instead of a separate script, and the redesigner emits `proposal.md` instead of `proposal.diff`. Layer 6 (video) is blocked on Richie's voice record. See `context/FEATURES.md` for per-row status. +> **Submission state**: Layers 1–4 + Layer 7 shipped. Layer 5 (`site/` fork + analytics pixel + `scripts/seed-mock-history.ts`) is scoped out for submission — the mock seeder is in phase 1 of the weekly-council skill (`skills/webster-weekly-council/references/seed-history.md`) and equivalently in `prompts/second-wbs-session.md` Step 1, and the redesigner emits `proposal.md` instead of `proposal.diff`. Layer 6 (video) is blocked on Richie's voice record. See `context/FEATURES.md` for per-row status. ## System Overview @@ -48,8 +48,9 @@ ### Layer 1: Routine + Orchestrator -- `routines/weekly-lp-improve.yaml` — cut from submission; weekly trigger is manual `wbs @prompts/second-wbs-session.md` -- `prompts/second-wbs-session.md` — bash-in-markdown orchestrator (replaces the planned `webster/orchestrator.ts`), reads state, fans out, runs genealogy, opens PR +- `routines/weekly-lp-improve.yaml` — cut from submission; weekly trigger is manual `/webster-weekly-council` (Claude Code skill) or `wbs @prompts/second-wbs-session.md` (single-page runbook) +- `skills/webster-weekly-council/SKILL.md` — library skill: slim index + on-demand phase references + helper scripts; the operator surface for the weekly run +- `prompts/second-wbs-session.md` — bash-in-markdown orchestrator (replaces the planned `webster/orchestrator.ts`), reads state, fans out, runs genealogy, opens PR; the immutable single-page runbook the skill mirrors - Shared agent skill `skills/webster-lp-audit/SKILL.md` — universal e2e flow: _read context → critique → write findings → exit_ - Per-critic context: `context/critics/{name}/findings.md` - Run artifacts: `history/YYYY-MM-DD/{analytics.json, council-output/, synthesis.md, proposal.md, decision.json}` @@ -67,13 +68,13 @@ Environment `environments/webster-council-env.json`: Agent specs (JSON, not YAML — matches `POST /v1/agents` schema): -- `agents/webster-monitor.json` — Haiku 4.5 -- `agents/brand-voice-critic.json` — Sonnet 4.6 -- `agents/fh-compliance-critic.json` — Sonnet 4.6 -- `agents/seo-critic.json` — Sonnet 4.6 -- `agents/conversion-critic.json` — Sonnet 4.6 -- `agents/copy-critic.json` — Sonnet 4.6 -- `agents/webster-redesigner.json` — Opus 4.7 +- `agents/production/webster-monitor.json` — Haiku 4.5 +- `agents/production/brand-voice-critic.json` — Sonnet 4.6 +- `agents/production/fh-compliance-critic.json` — Sonnet 4.6 +- `agents/production/seo-critic.json` — Sonnet 4.6 +- `agents/production/conversion-critic.json` — Sonnet 4.6 +- `agents/production/copy-critic.json` — Sonnet 4.6 +- `agents/production/webster-redesigner.json` — Opus 4.7 Each spec has: `name`, `model`, `system` (multi-line string with escaped \n), `tools: [{type: agent_toolset_20260401}]`, `metadata`. **No `callable_agents`** (research preview). diff --git a/context/FEATURES.md b/context/FEATURES.md index 104feb5..be12628 100644 --- a/context/FEATURES.md +++ b/context/FEATURES.md @@ -18,7 +18,7 @@ - **Cut**: 7 (out of submission scope; rationale inline) - **Todo**: 7 (1 submission form; all remaining implementation rows shipped or non-implementation blocked/cut) -Hero feature (Critic Genealogy) shipped with live Opus 4.7 validation. All 7 Managed Agents registered. Council fan-out + redesigner + PR automation scripted in `prompts/second-wbs-session.md`. CI green, 29 tests pass. Two scope reassignments below (critic-flow skill renamed; orchestrator moved from TS to bash-in-markdown prompt) — both ship equivalent functionality. +Hero feature (Critic Genealogy) shipped with live Opus 4.7 validation. All 7 Managed Agents registered. Council fan-out + redesigner + PR automation scripted in `prompts/second-wbs-session.md` (single-page runbook) and exposed as `/webster-weekly-council` library skill at `skills/webster-weekly-council/SKILL.md`. CI green, 29 tests pass. Two scope reassignments below (critic-flow skill renamed; orchestrator moved from TS to bash-in-markdown prompt + library skill) — both ship equivalent functionality. ## Stream allocation @@ -28,22 +28,22 @@ See `AGENTS.md` for stream → operator mapping. ## Layer 1: Routine + Orchestrator (Stream 1 — Claude Code Opus 4.7) -| # | Status | Feature | Hours | -| --- | ------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | -| 1 | cut | `routines/weekly-lp-improve.yaml` — Claude Code Routine with weekly cron. Submission uses manual `wbs @prompts/...` | 2 | -| 2 | done | Orchestrator — shipped as `prompts/second-wbs-session.md` (bash-in-markdown, not `.ts`). Functionally equivalent | 4 | -| 3 | done | Shared critic skill — shipped as `skills/webster-lp-audit/SKILL.md` (renamed from `critic-flow`) | 2 | -| 4 | done | Per-critic context pattern: `context/critics/{name}/findings.md` (5 critics + monitor seeded) | 1 | -| 5 | done | Run-artifact pattern: `history/YYYY-MM-DD/` — live `history/2026-04-23/` artifacts include analytics, proposal, decision, operator decision, and genealogy logs | 2 | -| 6 | done | Branch + PR automation via `gh pr create` — wired in Step 6 of `second-wbs-session.md` | 2 | +| # | Status | Feature | Hours | +| --- | ------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | +| 1 | cut | `routines/weekly-lp-improve.yaml` — Claude Code Routine with weekly cron. Submission uses manual `wbs @prompts/...` | 2 | +| 2 | done | Orchestrator — shipped as `prompts/second-wbs-session.md` (bash-in-markdown, not `.ts`) + `/webster-weekly-council` library skill (SKILL.md index + 9 phase references + 2 helper scripts). Functionally equivalent | 4 | +| 3 | done | Shared critic skill — shipped as `skills/webster-lp-audit/SKILL.md` (renamed from `critic-flow`) | 2 | +| 4 | done | Per-critic context pattern: `context/critics/{name}/findings.md` (5 critics + monitor seeded) | 1 | +| 5 | done | Run-artifact pattern: `history/YYYY-MM-DD/` — live `history/2026-04-23/` artifacts include analytics, proposal, decision, operator decision, and genealogy logs | 2 | +| 6 | done | Branch + PR automation via `gh pr create` — wired in Step 6 of `second-wbs-session.md` | 2 | ## Layer 2: Managed Agent Critics (Stream 2 — Codex heartbeat) | # | Status | Feature | Hours | | --- | ------ | -------------------------------------------------------------------------------------------------------------------- | ----- | -| 7 | done | `agents/webster-monitor.json` (Haiku 4.5) — analytics anomaly detection | 1 | +| 7 | done | `agents/production/webster-monitor.json` (Haiku 4.5) — analytics anomaly detection | 1 | | 8 | done | 5 specialist critic specs: seo, brand-voice, fh-compliance, conversion, copy (all Sonnet 4.6) — all schema-valid | 4 | -| 9 | done | `agents/webster-redesigner.json` (Opus 4.7) — synthesis + proposal generation | 1 | +| 9 | done | `agents/production/webster-redesigner.json` (Opus 4.7) — synthesis + proposal generation | 1 | | 10 | done | GitHub MCP integration — URL-based, vault-bound (`vault_ids`), no tokens in `user.message` | 3 | | 11 | done | Environment config — `environments/webster-council-env.json` + `.id` registered | 2 | | 12 | done | Parallel fan-out via orchestrator → 6 parallel `/v1/sessions` calls (not `callable_agents`; that's research-preview) | 2 | @@ -132,7 +132,7 @@ Key design calls (locked session 4 Phase 7, see ADR-0002): | # | Status | Feature | Hours | | --- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | -| 41a | done | **Visual-reviewer agent spec** — `agents/webster-visual-reviewer.json` (Opus 4.7 tier). Inputs: preview URL, `history//proposal.md`, BEFORE URL. Outputs: `history//visual-review.md` with findings + embedded screenshot refs. | 1 | +| 41a | done | **Visual-reviewer agent spec** — `agents/production/webster-visual-reviewer.json` (Opus 4.7 tier). Inputs: preview URL, `history//proposal.md`, BEFORE URL. Outputs: `history//visual-review.md` with findings + embedded screenshot refs. | 1 | | 41b | done | **Browser-audit skill** — `skills/webster-browser-audit/SKILL.md` wraps `scripts/browser-audit.ts` for Playwright-headless when available, with fallback artifacts when unavailable. Capabilities: 3-breakpoint screenshot (375/768/1440), accessibility-tree text extraction, interaction recording, and console log capture. | 3 | | 41c | done | **Proposal-intent verifier** — `scripts/proposal-intent-verifier.ts` reads each issue in `proposal.md` and verifies visible phrase presence in rendered accessibility text (not source grep). Catches content drops like session-4 "No more patient churn" regression; layout overflow is covered by browser-audit summaries. | 2 | | 41d | done | **#39 integration pattern** — apply worker now runs the visual-reviewer gate after #39c and before #39d PR emission. It retries up to 3 iterations, records `visual_review` in `apply-log.json`, and forces draft/partial PR metadata on CRITICAL visual regressions. | 1 | @@ -160,17 +160,17 @@ Added session 4 Phase 7. Autoresearch is **input to the next council run**, not Session 4 Phase 7 locked 9 architectural questions (Q1–Q9) — all resolved in `context/DOMAIN-MODEL.md`. Key locks: Q1 Managed Agent + orchestrator-owned memory (ADR-0001), Q2 explore-broadly cold-start + unified `history/memory.jsonl`, Q3 autonomous p<0.01 rollback, Q4 reward+gates 7-outcome matrix (ADR-0002), Q5 planner-requests-new-critic via L3 genealogy (additive-only), Q5.1 four-layer genealogy governance, Q6 skip-is-terminal + structured skip rows, Q7 Pair Alpha (SaaS + local service) substrate pair, Q8 per-experiment baselines + commit trailers (ADR-0002), Q9 4-week demo arc. -| # | Status | Feature | Hours | -| --- | ------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | -| 50 | done | **`agents/webster-planner.json`** (Opus 4.7 Managed Agent, Q1 ADR-0001) — registered via `POST /v1/agents`, invoked per-run via `/v1/sessions` + events + poll (pattern verified in `scripts/critic-genealogy.ts:440-556`). Reads marshaled memory context, outputs `plan.md` with `{classification, next_action, direction_hint, new_critic_request?, rationale}`. `next_action ∈ {promote_and_experiment, hold_baseline, revert_and_retry, explore_broadly}`. _Landed on `forge/task-feat-planner-agent-spec-v5` — PR #3 merged (2026-04-24)._ | 2 | -| 51 | done | **Memory substrate schema + append helper** (Q2) — `history/memory.jsonl` event log: `{ts, week, actor, event, refs{}, insight}` where `event ∈ {promote, rollback, skip, regression, gap-detected, verdict-ready}`. Append-only. Helper in orchestrator never touched by agents — orchestrator owns all I/O (ADR-0001). _Landed on `forge/task-feat-memory-substrate` — all 4 stories (MemoryEvent types, appendEvent, tailN+filter, unit tests)._ | 2 | -| 52 | done | **Orchestrator memory marshaling + planner invocation** (Q1, Q2) — new step in `prompts/second-wbs-session.md`: before critics, orchestrator reads `memory.jsonl` tail (last N events) + last 2 weeks' `verdict.json` + `monitor` anomaly report, concatenates to planner's user-message text (step 3 of the 5-step Managed Agent flow), polls until idle, extracts output, writes `history//plan.md`, appends one `verdict-ready` event row to `memory.jsonl`. _Landed on `forge/task-feat-orch-memory-planner-v2` — PR #6 merged (2026-04-24)._ | 3 | -| 53 | done | **Plan → council integration (additive-only)** (Q5) — critics + monitor + redesigner now receive `plan.md` body in initial `user.message` context with explicit additive-only/sovereignty language. Planner `new_critic_request` is extracted to `tmp/planner-new-critic-request-.json` and passed into `scripts/critic-genealogy.ts --planner-request` as additive evidence, without bypassing dedup/cap/evidence gates. | 3 | -| 54 | done | **Cold-start explore-broadly mode** (Q2) — planner context now emits `direction_hint="broad exploration, baseline-only analytics"` when memory/verdict/monitor inputs are empty, and `appendColdStartOriginEvent()` writes the origin event row. | 2 | -| 55 | done | **Genealogy governance layers 2–4** (Q5.1) — layer 1 is prompt-only (rubric in planner + redesigner instructions: "request only if existing critics cannot cover the concern"). Layer 2: orchestrator-side dedup — reject new-critic spec if ≥60% scope overlap with existing critic (embedding cosine). Layer 3: quarterly cap — max 3 new critics / 13 weeks, soft-override by operator. Layer 4: retire-on-idle — critic with 0 findings-promoted in 8 weeks is archived. _Landed on `forge/task-feat-genealogy-gov-v1` — PR #8 merged (2026-04-24)._ | 3 | -| 56 | done | **Skip-contract plumbing** (Q6) — apply-worker, critic-rerun gate, and visual-review gate now emit canonical structured skip rows to `history//skips.jsonl` and append skip events to `history/memory.jsonl` with reasons `{apply-fail, critic-veto, visual-veto}`. Skip is terminal and feeds next-week planner. | 2 | -| 57 | done | **`scripts/seed-demo-arc.ts`** (Q9) — 4-week primary-substrate mock: 9 experiments + 1 genealogy spawn in W4. Hits 6/7 Q4 outcome lanes (fast-track, fallback, gate-win, archive-gate-fail, auto-rollback, hold). Idempotent; writes to `history/demo-arc/` without touching live history. _Landed on `forge/task-feat-seed-demo-arc-w3w4-v5` — all 4 stories (W1/W2/W3/W4 + genealogy) done, PR #5 merged (2026-04-24)._ | 3 | -| 58 | done | **`scripts/seed-secondary-substrates.ts`** (Q7) — Pair Alpha mock: SaaS (B2B) + local service (B2C) synthetic single-file HTMLs + 2-cycle mock runs each (onboard + 2 weeks of experiments). Proves generalization beyond the primary substrate. Demo-safe (no e-commerce — private hold-out per operator decision). _Landed on `forge/task-feat-seed-pair-alpha-v1` — PR #7 merged (2026-04-24)._ | 3 | +| # | Status | Feature | Hours | +| --- | ------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----- | +| 50 | done | **`agents/production/webster-planner.json`** (Opus 4.7 Managed Agent, Q1 ADR-0001) — registered via `POST /v1/agents`, invoked per-run via `/v1/sessions` + events + poll (pattern verified in `scripts/critic-genealogy.ts:440-556`). Reads marshaled memory context, outputs `plan.md` with `{classification, next_action, direction_hint, new_critic_request?, rationale}`. `next_action ∈ {promote_and_experiment, hold_baseline, revert_and_retry, explore_broadly}`. _Landed on `forge/task-feat-planner-agent-spec-v5` — PR #3 merged (2026-04-24)._ | 2 | +| 51 | done | **Memory substrate schema + append helper** (Q2) — `history/memory.jsonl` event log: `{ts, week, actor, event, refs{}, insight}` where `event ∈ {promote, rollback, skip, regression, gap-detected, verdict-ready}`. Append-only. Helper in orchestrator never touched by agents — orchestrator owns all I/O (ADR-0001). _Landed on `forge/task-feat-memory-substrate` — all 4 stories (MemoryEvent types, appendEvent, tailN+filter, unit tests)._ | 2 | +| 52 | done | **Orchestrator memory marshaling + planner invocation** (Q1, Q2) — new step in `prompts/second-wbs-session.md`: before critics, orchestrator reads `memory.jsonl` tail (last N events) + last 2 weeks' `verdict.json` + `monitor` anomaly report, concatenates to planner's user-message text (step 3 of the 5-step Managed Agent flow), polls until idle, extracts output, writes `history//plan.md`, appends one `verdict-ready` event row to `memory.jsonl`. _Landed on `forge/task-feat-orch-memory-planner-v2` — PR #6 merged (2026-04-24)._ | 3 | +| 53 | done | **Plan → council integration (additive-only)** (Q5) — critics + monitor + redesigner now receive `plan.md` body in initial `user.message` context with explicit additive-only/sovereignty language. Planner `new_critic_request` is extracted to `tmp/planner-new-critic-request-.json` and passed into `scripts/critic-genealogy.ts --planner-request` as additive evidence, without bypassing dedup/cap/evidence gates. | 3 | +| 54 | done | **Cold-start explore-broadly mode** (Q2) — planner context now emits `direction_hint="broad exploration, baseline-only analytics"` when memory/verdict/monitor inputs are empty, and `appendColdStartOriginEvent()` writes the origin event row. | 2 | +| 55 | done | **Genealogy governance layers 2–4** (Q5.1) — layer 1 is prompt-only (rubric in planner + redesigner instructions: "request only if existing critics cannot cover the concern"). Layer 2: orchestrator-side dedup — reject new-critic spec if ≥60% scope overlap with existing critic (embedding cosine). Layer 3: quarterly cap — max 3 new critics / 13 weeks, soft-override by operator. Layer 4: retire-on-idle — critic with 0 findings-promoted in 8 weeks is archived. _Landed on `forge/task-feat-genealogy-gov-v1` — PR #8 merged (2026-04-24)._ | 3 | +| 56 | done | **Skip-contract plumbing** (Q6) — apply-worker, critic-rerun gate, and visual-review gate now emit canonical structured skip rows to `history//skips.jsonl` and append skip events to `history/memory.jsonl` with reasons `{apply-fail, critic-veto, visual-veto}`. Skip is terminal and feeds next-week planner. | 2 | +| 57 | done | **`scripts/seed-demo-arc.ts`** (Q9) — 4-week primary-substrate mock: 9 experiments + 1 genealogy spawn in W4. Hits 6/7 Q4 outcome lanes (fast-track, fallback, gate-win, archive-gate-fail, auto-rollback, hold). Idempotent; writes to `history/demo-arc/` without touching live history. _Landed on `forge/task-feat-seed-demo-arc-w3w4-v5` — all 4 stories (W1/W2/W3/W4 + genealogy) done, PR #5 merged (2026-04-24)._ | 3 | +| 58 | done | **`scripts/seed-secondary-substrates.ts`** (Q7) — Pair Alpha mock: SaaS (B2B) + local service (B2C) synthetic single-file HTMLs + 2-cycle mock runs each (onboard + 2 weeks of experiments). Proves generalization beyond the primary substrate. Demo-safe (no e-commerce — private hold-out per operator decision). _Landed on `forge/task-feat-seed-pair-alpha-v1` — PR #7 merged (2026-04-24)._ | 3 | ## Totals (historical — initial plan) diff --git a/skills/webster-weekly-council/SKILL.md b/skills/webster-weekly-council/SKILL.md new file mode 100644 index 0000000..b016e3e --- /dev/null +++ b/skills/webster-weekly-council/SKILL.md @@ -0,0 +1,66 @@ +--- +name: webster-weekly-council +description: Run one full Webster weekly landing-page council pass — pre-flight, planner, 6 parallel managed-agent critics, genealogy gap-detection, redesigner synthesis, and draft PR. Use when the operator runs /webster-weekly-council or asks for the weekly Webster council run. Replaces `wbs @prompts/second-wbs-session.md` as the primary operator path; the prompt remains the readable single-page fallback runbook. +--- + +# Webster Weekly Council (Library) + +One full council pass: planner → 6 parallel critics → genealogy → redesigner → draft PR. **30–50 min wall-clock. ~$0.16–0.25 in API tokens per run.** + +> **Override default Operating Loop.** This is a council-run session. Execute the phases below end-to-end. Do NOT call `forge isolation list`, do NOT scan `FEATURES.md` for `todo` rows, do NOT enter a feature-implementation loop. The phases are sequential; do not skip ahead. + +## Phase index — load on demand + +Read `references/preflight.md` first. After that, load only the reference for the phase you're executing. Do not load all references up front. + +| # | Phase | Wall-clock | Reference | +| --- | ---------------------------------------------------- | ---------- | ----------------------------- | +| 0 | Pre-flight + session constants | <1 min | `references/preflight.md` | +| 1 | Seed 10-week mock analytics history (idempotent) | 2 min | `references/seed-history.md` | +| 2 | Prepare shared `council/` branch | 1 min | `references/branch.md` | +| 3 | Run planner (Opus 4.7) → `plan.md` + memory event | 3–5 min | `references/planner.md` | +| 4 | Fan-out 6 parallel managed-agent sessions | 15–20 min | `references/fan-out.md` | +| 5 | Verify findings (gate before redesigner) | 2 min | `references/verify.md` | +| 5.5 | Critic Genealogy gap-detection (fail-open) | 0–10 min | `references/genealogy.md` | +| 6 | Redesigner session → `proposal.md` + `decision.json` | 5–10 min | `references/redesign.md` | +| 7+8 | Open draft PR + write checkpoint | 1 min | `references/publish.md` | +| — | If a step fails | — | `references/failure-modes.md` | + +## Always-true invariants + +These hold across every phase. Do not break them. + +1. **`ANTHROPIC_API_KEY` is fetched from macOS keychain into a scoped bash variable.** Never read it from the operator's shell env — that would bill `claude -p` and any Forge call against API credits instead of the Max sub. +2. **Registration artifacts must exist before fan-out.** `environments/webster-council-env.id`, `context/monitor/id.txt`, `context/redesigner/id.txt`, and the 5 critic id files. If missing, run `prompts/first-wbs-session.md` first. +3. **All writes go to `council/$WEEK_DATE`.** Local working tree and remote committers (critics + redesigner via GitHub MCP) all target the same branch. The orchestrator only creates the branch locally; critics' system prompts handle remote create-or-skip. +4. **Phase 3 (planner) is fail-closed.** If planner errors, halt before fan-out. Redesigner has no findings to synthesize without it. +5. **Phase 5.5 (genealogy) is fail-open.** Errors log and continue. Redesigner runs against the 5 original critic findings if genealogy fails. +6. **Phase 5 (verify) is the redesigner gate.** If <3 critics produced non-stub findings, abort before spending redesigner tokens. +7. **Findings are sovereign over planner direction_hint.** Critics report what they find regardless of plan; redesigner uses the hint as a weighting input but CRITICAL/HIGH critic evidence overrides. +8. **Don't merge the PR.** It opens as a draft. Human review = approval. The operator's job ends at PR-open + checkpoint. + +## Helper scripts ("unlocked tools") + +These are reusable across phases. Each is a self-contained CLI; pass args from the bash blocks in references. + +- **`scripts/run-agent-session.sh`** — create a managed-agent session, send a `user.message`, poll until idle/completed/timeout. Used 7× across phases 4 and 6 (monitor + 5 critics + redesigner). Exit code: `0` ok, `1` failed, `2` timeout. +- **`scripts/extract-planner-json.py`** — pull the JSON block out of `history//plan.md` for the optional `new_critic_request` extraction in phase 3. + +Both scripts are skill-local under `skills/webster-weekly-council/scripts/`. Repo-level helpers (`scripts/planner-invoke.ts`, `scripts/planner-context.ts`, `scripts/critic-genealogy.ts`, `scripts/validate-findings.ts`) are unchanged and still called by reference bash blocks. + +## Production invariant — `prompts/second-wbs-session.md` + +The 662-line bash-in-markdown prompt is the immutable production runbook. This skill is the operator surface; the prompt is the source-of-truth runbook. They produce the same artifacts. + +- The prompt is locked by `scripts/__tests__/sim-council.test.ts:75` — `git diff prompts/second-wbs-session.md` must be empty. +- Nicolette's live council on `main` runs the prompt via `wbs @prompts/second-wbs-session.md`. +- If the skill drifts from the prompt, **fix the skill** — never the prompt. + +## Quick start + +1. Load `references/preflight.md` → run pre-flight + session constants. +2. Load each phase reference in order (1 → 2 → 3 → 4 → 5 → 5.5 → 6 → 7+8). +3. If any phase fails, load `references/failure-modes.md` and follow the named recovery. +4. Final state: draft PR open, checkpoint committed under `.claude/checkpoints/`. + +If a phase genuinely cannot be expressed as a skill instruction, surface `[STUCK]` per `AGENTS.md` and stop. Visible struggle > invisible corner-cutting. diff --git a/skills/webster-weekly-council/references/branch.md b/skills/webster-weekly-council/references/branch.md new file mode 100644 index 0000000..2cb582d --- /dev/null +++ b/skills/webster-weekly-council/references/branch.md @@ -0,0 +1,25 @@ +# Phase 2 — Prepare shared `council/` branch + +**~1 min.** + +All 6 parallel workers + the redesigner commit to `$BRANCH` via GitHub MCP. The critics' system prompts already create-or-skip the branch via `create_branch` MCP (422-on-exists is treated as success), so this local setup is about keeping your working tree consistent — not about creating the branch remotely. + +```bash +git checkout main +git pull +git fetch origin +if git ls-remote --heads origin "$BRANCH" | grep -q "$BRANCH"; then + echo "Branch $BRANCH already exists on origin — reusing" + git checkout -B "$BRANCH" "origin/$BRANCH" +else + git checkout -B "$BRANCH" main + git push -u origin "$BRANCH" +fi +``` + +## Why both branches are valid + +- **Branch already exists**: a prior attempt this week (failed mid-run, or genealogy/redesigner re-run). Reuse so the new orchestrator-side commits land alongside the existing critic findings. +- **Branch missing**: fresh weekly run. Create from `main` and push so MCP-side critic commits don't race local creation. + +The local working tree must match `origin/$BRANCH` before phase 3, otherwise the planner commit in phase 3 would diverge from the branch the critics will read. diff --git a/skills/webster-weekly-council/references/failure-modes.md b/skills/webster-weekly-council/references/failure-modes.md new file mode 100644 index 0000000..27fcc24 --- /dev/null +++ b/skills/webster-weekly-council/references/failure-modes.md @@ -0,0 +1,29 @@ +# Failure modes — recovery guide + +The skill is idempotent end-to-end. Any phase can be re-run after fixing the named cause. + +| Failure | Recovery | +| ---------------------------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | +| **Pre-flight abort** (key/keychain/registration artifacts) | Fix the named prerequisite. Re-run the whole skill — preflight is idempotent and every later phase checks for its own outputs before running. | +| **Planner fails (phase 3)** | Run halts before fan-out. Read `tmp/logs/planner.log`, fix the named context/API/JSON problem, re-run from phase 3. Do not continue without `history/$WEEK_DATE/plan.md`. | +| **One critic times out (phase 4)** | Re-run only that critic: `bash skills/webster-weekly-council/scripts/run-agent-session.sh