Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 0 additions & 9 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -71,10 +71,6 @@ skills/website-to-hyperframes
.agents/
.claude/skills/

# HyperFrames render artifacts (regenerable from compositions)
video/snapshots/
video/renders/

# Rendered timelapse mp4 — hosted externally for the hackathon submission
demo-output/videos/

Expand All @@ -83,11 +79,6 @@ demo-output/videos/
/plan.md
/research.md

# Claude Design polish handoff bundles — committed per-slot only after review
skills/webster-video/polish-slots/**/handoff/
skills/webster-video/polish-slots/handoff-shared/
skills/webster-video/polish-slots.zip

# Internal tracking docs — preserved in ~/Vault/Projects/webster/internal-tracking/
context/EXPANSION-TASKS.md
context/E2E-IMPLEMENTATION-TRACKER.md
Expand Down
5 changes: 3 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ This file is for implementation operators. See `skills/webster-lp-audit/SKILL.md
Two active workstreams:

- **Production Webster** — Nicolette's weekly landing-page improvement council runs on `main`. Operator surface: `/webster-weekly-council` (skill at `skills/webster-weekly-council/SKILL.md`) or the single-page runbook at `prompts/second-wbs-session.md`. Both produce identical artifacts; the prompt is the locked source-of-truth runbook. This is live for her business; do not break it.
- **Hackathon expansion** — Single-substrate Richer Health LP demo with a simulation runner producing timelapse assets. Deadline **2026-04-28**. Working branch: `dev/`. See `context/VISION.md` for canonical north-star.
- **Single-substrate Richer Health LP demo** with a simulation runner producing 11-week timelapse assets under `demo-output/landing-page/`. See `context/VISION.md` for canonical north-star.

## First actions every session

Expand Down Expand Up @@ -129,8 +129,9 @@ Use `TaskCreate` / `TaskUpdate` for multi-step work within a single session. Tas
Webster ships these skills:

- `skills/webster-weekly-council/SKILL.md` — operator surface for the weekly run. Library skill: SKILL.md index + on-demand phase references + helper scripts. Slash-command form: `/webster-weekly-council`. Equivalent single-page runbook at `prompts/second-wbs-session.md`.
- `skills/webster-onboarding/SKILL.md` — first-time setup for a new operator (brand context capture, key checklist, repo scaffold, agent + memory-store provisioning, first council)
- `skills/webster-lp-audit/SKILL.md` — shared council run discipline (referenced by production critics)
- `skills/webster-browser-audit/SKILL.md` — headless browser audit for visual review
- `skills/webster-browser-audit/SKILL.md` — headless browser audit capability for visual review

If your work modifies any skill, test with a sample invocation before committing. The weekly-council skill must stay artifact-equivalent with `prompts/second-wbs-session.md` — when in doubt, fix the skill, never the prompt.

Expand Down
24 changes: 19 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

## The one-line pitch

Small businesses pay marketing agencies $2K–$20K/month for landing-page optimization that arrives in 4–6 week cycles. Webster runs the audit + proposal loop for ~$0.60/month in Opus 4.7 tokens and hands the operator a reviewable draft PR each week. The win is cycle time (minutes vs weeks) and the baseline cost of the analytical loop — a human still reviews the PR before it ships.
A council of 9 Claude Managed Agents audits a landing page once a week, synthesizes findings across SEO, brand-voice, compliance, conversion, copy, and rendered-layout lenses, and hands the operator a reviewable draft PR. The win is cycle time — the analytical loop runs in tens of minutes instead of multi-week agency rounds — and a runtime mechanism (Critic Genealogy) where Opus 4.7 detects an unowned audit gap and registers a brand-new specialist agent against the live API mid-run. A human still reviews the PR before it ships.

## The hero moment — Critic Genealogy

Expand Down Expand Up @@ -81,12 +81,24 @@ bun scripts/critic-genealogy.ts --fixtures scripts/__tests__/fixtures/genealogy

**Live-run evidence:** the operator surface is the [`/webster-weekly-council`](skills/webster-weekly-council/SKILL.md) skill (library: SKILL.md index + on-demand phase references + helper scripts); the full single-page runbook lives at [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md). Registration IDs live in `environments/webster-council-env.id` and `context/*/id.txt`. Run artifacts are written under `history/<week>/` when the weekly run executes.

**Demo arc artifacts:** the hackathon timelapse animates an 11-week simulation council run. Per-week deliverables live under [`demo-output/landing-page/`](demo-output/landing-page/) (`w00..w10`): desktop/mobile/tablet screenshots, heatmap JSON+SVG, synthetic analytics, and the visual reviewer's markdown verdict. Anthropic Managed Agents memory-store provisioning is captured at [`assets/memory-stores-screenshots/`](assets/memory-stores-screenshots/). The rendered timelapse is hosted externally (link in the submission form); reproduce locally with `bun skills/webster-video/scripts/hydrate-demo-assets.ts && cd video && npx hyperframes render -q high --strict`.
**Demo arc artifacts:** an 11-week simulation council run, week-by-week, browsable as files. Start at [`demo-output/landing-page/INDEX.md`](demo-output/landing-page/INDEX.md) for the narrated walk-through. Each week directory under [`demo-output/landing-page/w00..w10/`](demo-output/landing-page/) contains desktop/mobile/tablet screenshots, heatmap JSON+SVG, synthetic analytics, and the visual reviewer's markdown verdict. Anthropic Managed Agents memory-store provisioning is captured at [`assets/memory-stores-screenshots/`](assets/memory-stores-screenshots/). The render pipeline that turns these per-week assets into a timelapse video is submission tooling and lives outside the public repo.

**Hero code:** [`scripts/critic-genealogy.ts`](scripts/critic-genealogy.ts) is the runtime specialist-spawn path; [`scripts/__tests__/critic-genealogy.test.ts`](scripts/__tests__/critic-genealogy.test.ts) and [`scripts/__tests__/fixtures/genealogy`](scripts/__tests__/fixtures/genealogy) are the fixture proof.

**Validate locally:** run `bun install` once, then `bun run validate` for type-check, zero-warning lint, format, agent schemas, findings format, markdown, and tests.

## 5-minute judge tour

If you're evaluating this submission and have five minutes:

1. **Read the 30-second pitch + hero moment above** (you're here) — that's the architecture and the novel-mechanic claim in one screen.
2. **Open [`demo-output/landing-page/INDEX.md`](demo-output/landing-page/INDEX.md)** — narrated walk through the 11-week LP timelapse. One paragraph per week, links to that week's screenshots + heatmap + visual-reviewer verdict.
3. **Click into one week's `visual-review.md`** (e.g. [`w04/visual-review.md`](demo-output/landing-page/w04/visual-review.md) for the largest beat, [`w10/visual-review.md`](demo-output/landing-page/w10/visual-review.md) for the terminal polish) — that's what the council actually wrote about its own changes.
4. **Read [`scripts/critic-genealogy.ts`](scripts/critic-genealogy.ts)** — the hero file. Two tools (`report_no_gap` / `report_gap`), Opus 4.7 picks one, then drafts a JSON spec, registers it via `POST /v1/agents`, and invokes it via `POST /v1/sessions` — all at runtime.
5. **Optional, if a terminal is handy:** `bun install && bun scripts/critic-genealogy.ts --fixtures scripts/__tests__/fixtures/genealogy --dry-run`. Live Opus 4.7 call against the committed fixture findings, ~15s wall clock, prints the new critic spec it would have registered.

[`agents/production/`](agents/production/) holds the 9 pre-registered specs; [`agents/simulation/`](agents/simulation/) holds the 1:1 simulation mirror used for the timelapse run. [`prompts/second-wbs-session.md`](prompts/second-wbs-session.md) is the production weekly orchestrator (locked); [`skills/webster-weekly-council/SKILL.md`](skills/webster-weekly-council/SKILL.md) is the same flow as a Claude Code skill.

## What's in the repo

```text
Expand All @@ -99,7 +111,9 @@ webster/
├── prompts/ first-wbs-session.md (bootstrap), second-wbs-session.md (weekly run runbook)
├── scripts/ validate-agents, validate-findings, critic-genealogy
├── skills/ webster-weekly-council (operator surface for the weekly run),
│ webster-lp-audit (shared critic discipline)
│ webster-onboarding (first-time setup for a new operator),
│ webster-lp-audit (shared critic discipline),
│ webster-browser-audit (Playwright-headless audit capability)
├── .github/workflows/ CI: type + lint + format + schema + findings + markdown + tests
├── .husky/ pre-commit runs the same gates locally
└── AGENTS.md operator guide for in-repo work
Expand All @@ -117,7 +131,7 @@ The live council runner is a Claude Code library skill: [`/webster-weekly-counci
6. Runs the redesigner — commits `history/YYYY-MM-DD/proposal.md` + `decision.json`.
7. Opens a draft PR.

Expected wall-clock: 30–50 min. Expected API cost: ~$0.16–0.25 per run.
Wall-clock per run is in the tens of minutes; the bulk of that is the parallel critic fan-out, not orchestration overhead.

**Submission note**: all 9 agent specs are registered against the live Anthropic API (IDs in `environments/webster-council-env.id` + `context/*/id.txt`), the genealogy hero is live-validated (~$0.03 Opus 4.7 dry-run documented above), and the full orchestration prompt is committed. The end-to-end fan-out that produces `history/YYYY-MM-DD/` artifacts is the operator-triggered weekly run — `history/` is empty at submission time by design. Loop has been exercised component-by-component.

Expand All @@ -131,7 +145,7 @@ bun run validate

Chains: `tsc --noEmit` → `eslint --max-warnings 0` → `prettier --check` → agent+environment schema validation → findings format validation → markdownlint → `bun test`. Every gate is blocking. Pre-commit hook enforces the same set. CI enforces the same set on push + PR. See [`context/QUALITY-GATES.md`](context/QUALITY-GATES.md).

Current state: 175 tests passing, 0 lint warnings, 0 type errors, 18 JSON specs valid, 6 findings files valid.
Current state: 29 test files green via `bun run validate`, 0 lint warnings, 0 type errors, 18 JSON specs valid, 6 findings files valid.

## Prize-lane alignment

Expand Down
52 changes: 30 additions & 22 deletions context/ARCHITECTURE.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> Mirrors [[webster-architecture]] in vault. Canonical source is this file for in-repo operators; vault file for cross-session memory.
>
> **Submission state**: Layers 1–4 + Layer 7 shipped. Layer 5 (`site/` fork + analytics pixel + `scripts/seed-mock-history.ts`) is scoped out for submission — the mock seeder is in phase 1 of the weekly-council skill (`skills/webster-weekly-council/references/seed-history.md`) and equivalently in `prompts/second-wbs-session.md` Step 1, and the redesigner emits `proposal.md` instead of `proposal.diff`. Layer 6 (video) is blocked on Richie's voice record. See `context/FEATURES.md` for per-row status.
> **Shipped state**: 9 production Managed Agents, mirrored 1:1 by 9 `webster-lp-sim-*` simulation specs. Full council loop runs end-to-end — planner → fan-out → redesigner → visual review — with critic genealogy as the runtime specialist-spawn beat. The redesigner emits `proposal.md` (PR body) rather than `proposal.diff`; a real `site/` fork that lets the council emit a one-click diff is roadmap, not pending. See `context/FEATURES.md` for the full inventory.

## System Overview

Expand All @@ -13,8 +13,12 @@
│ Claude Code Session (orchestrator — Opus 4.7) │
│ ├─ reads site/ + history/ + context/critics/*/findings.md │
│ │ │
│ ├─ fan-out: POST /v1/sessions for each of 6 pre-registered │
│ │ Managed Agents (parallel), then send user.message event │
│ ├─ planner session (Opus 4.7) │
│ │ ├─ marshals memory + verdicts + monitor anomalies │
│ │ └─ writes plan.md with direction_hint for the week │
│ │ │
│ ├─ fan-out: POST /v1/sessions for 6 pre-registered Managed │
│ │ Agents (parallel), then send user.message event │
│ │ ├─ monitor (Haiku 4.5) — detects analytics anomalies │
│ │ ├─ 5 specialist critics (Sonnet 4.6) │
│ │ │ ├─ SEO, brand-voice, FH-compliance, │
Expand All @@ -24,7 +28,9 @@
│ ├─ redesigner session (Opus 4.7) │
│ │ ├─ orchestrator gathers committed findings │
│ │ ├─ passes them as input text to redesigner session │
│ │ └─ redesigner outputs proposal.diff + decision.json │
│ │ └─ redesigner outputs proposal.md + decision.json │
│ │ │
│ ├─ visual-reviewer (Opus 4.7) — post-redesign visual audit │
│ │ │
│ ├─ Critic Genealogy (runtime creation, public beta) │
│ │ ├─ detects pattern no existing critic owns │
Expand Down Expand Up @@ -55,7 +61,7 @@
- Per-critic context: `context/critics/{name}/findings.md`
- Run artifacts: `history/YYYY-MM-DD/{analytics.json, council-output/, synthesis.md, proposal.md, decision.json}`

### Layer 2: Managed Agent Critics (7 pre-registered)
### Layer 2: Pre-registered Managed Agents (9 production, mirrored 1:1 by 9 simulation)

**Environment is a separate resource** (`POST /v1/environments`), registered once per workspace and referenced by ID in every session. There is NO in-agent `environment:` or `resources:` field.

Expand All @@ -66,17 +72,23 @@ Environment `environments/webster-council-env.json`:
- Networking: `limited` with `allowed_hosts: [api.github.com, github.com, raw.githubusercontent.com, api.anthropic.com]`, `allow_mcp_servers: true`, `allow_package_managers: true`
- No GitHub-repo mount primitive exists — the agent `git clone`s at session start via bash using a `GITHUB_TOKEN` passed in the first user.message

Agent specs (JSON, not YAML — matches `POST /v1/agents` schema):
Production specs (JSON, not YAML — matches `POST /v1/agents` schema):

| Spec | Model | Role |
| ------------------------------------------------ | ---------- | -------------------------- |
| `agents/production/webster-planner.json` | Opus 4.7 | orchestrator (pre-fan-out) |
| `agents/production/brand-voice-critic.json` | Sonnet 4.6 | critic |
| `agents/production/conversion-critic.json` | Sonnet 4.6 | critic |
| `agents/production/copy-critic.json` | Sonnet 4.6 | critic |
| `agents/production/fh-compliance-critic.json` | Sonnet 4.6 | critic |
| `agents/production/seo-critic.json` | Sonnet 4.6 | critic |
| `agents/production/webster-visual-reviewer.json` | Opus 4.7 | critic (post-redesign) |
| `agents/production/webster-monitor.json` | Haiku 4.5 | monitor |
| `agents/production/webster-redesigner.json` | Opus 4.7 | redesigner |

- `agents/production/webster-monitor.json` — Haiku 4.5
- `agents/production/brand-voice-critic.json` — Sonnet 4.6
- `agents/production/fh-compliance-critic.json` — Sonnet 4.6
- `agents/production/seo-critic.json` — Sonnet 4.6
- `agents/production/conversion-critic.json` — Sonnet 4.6
- `agents/production/copy-critic.json` — Sonnet 4.6
- `agents/production/webster-redesigner.json` — Opus 4.7
Simulation set at `agents/simulation/webster-lp-sim-*` mirrors the production roster 1:1 — same models, same role distribution, no extra surface for judges to evaluate. Sim agents are additive, never touching production. **No `callable_agents`** (research preview) on either set.

Each spec has: `name`, `model`, `system` (multi-line string with escaped \n), `tools: [{type: agent_toolset_20260401}]`, `metadata`. **No `callable_agents`** (research preview).
Each spec has: `name`, `model`, `system` (multi-line string with escaped \n), `tools: [{type: agent_toolset_20260401}]`, `metadata`.

### Layer 3: Critic Genealogy (novel mechanic)

Expand Down Expand Up @@ -235,11 +247,7 @@ Production/sim agents should receive the same evidence order, especially prior h

### Layer 6: Meta Video

- Remotion template + 5 comps (title, council viz, TAM+10wk morph, Genealogy diagram, end-card)
- Opus-authored narration script (`video/script.md`)
- Voice: Richie's own, Sat AM record
- Final assembly in Descript or CapCut, 3-min clean cut
- End-card: commit hashes for Claude-authored assets
Submission tooling, not part of the product. The HyperFrames render pipeline that turns the per-week LP simulation assets into a timelapse video lives outside the public repo. Per-week deliverables stay committed under `demo-output/landing-page/w00..w10/` as judge evidence: desktop/mobile/tablet screenshots, heatmap JSON+SVG, synthetic analytics, visual-review verdicts. See `demo-output/landing-page/INDEX.md` for the narrated walk-through.

### Layer 7: Polish

Expand All @@ -253,18 +261,18 @@ Production/sim agents should receive the same evidence order, especially prior h
1. **Agents are registered from the orchestrator session.** `POST /v1/agents` from Claude Code (orchestrator), never from inside a Managed Agent's own loop. Both pre-registered critics AND runtime-created Genealogy critics are registered this way.
2. **Environments are separate resources.** `POST /v1/environments` once per workspace; referenced by `environment_id` in every session.
3. **No `callable_agents`.** Agent-to-agent invocation is research preview. Orchestrator fans out via parallel `/v1/sessions` calls.
4. **State lives in git.** Critics commit findings from inside their sessions. No managed memory stores (also research preview).
4. **State is hybrid.** Authoritative state lives in git — critics commit findings from inside their sessions, run artifacts land under `history/`. Six Anthropic Managed Memory Stores (registered IDs in `context/memory-stores.json`) hold cross-session priors for council, planner, redesigner, genealogy, conversion-critic, and visual-reviewer; git remains the auditable source of truth.
5. **Credentials**: orchestrator holds `ANTHROPIC_API_KEY` + `GITHUB_TOKEN`. Sessions receive `GITHUB_TOKEN` in the first user.message so they can `git clone` + push. Cloudflare creds are onboarding-only.
6. **Skill is universal.** Same markdown, Claude Code + claude.ai.
7. **Zero fabricated stats.** Mock analytics framed as POC priors.

## Dependencies

- Anthropic Managed Agents API, beta header `managed-agents-2026-04-01` (public beta — verified live 2026-04-23)
- (Research preview, NOT required for public beta path: `callable_agents`, memory stores, outcomes — request at <https://claude.com/form/claude-managed-agents>)
- Anthropic Managed Memory Stores (public beta) — six stores per substrate, IDs at `context/memory-stores.json`
- (Research preview, NOT required for public beta path: `callable_agents`, outcomes — request at <https://claude.com/form/claude-managed-agents>)
- Claude Code (Routines, `/v1/claude_code/routines/{id}/fire`)
- Claude Design (user-facing, bundle `.zip`)
- Cloudflare Workers + Static Assets + Workers Builds
- GitHub (MCP + webhooks)
- Astro 6 + `@astrojs/cloudflare`
- Remotion (video)
Loading