diff --git a/TODOS.md b/TODOS.md new file mode 100644 index 00000000..df651ba4 --- /dev/null +++ b/TODOS.md @@ -0,0 +1,62 @@ +# TODOS + +Items discovered during Grove v2 CEO review (2026-03-20). + +## P2 — Post-v2 cleanup + +### Remove TmuxRuntime fallback +- **What:** Remove `TmuxRuntime` adapter once acpx is proven stable in production. +- **Why:** TmuxRuntime exists only as a safety net during acpx integration. Once acpx handles all supported agents reliably, the tmux codepath is dead weight. +- **Effort:** S | **Depends on:** Phase 2 (AgentRuntime) shipped + acpx stability proven (~1 month) + +### Separate EventBus semaphore from store semaphore +- **What:** Nexus per-store semaphore (20 concurrent ops) may be shared with EventBus. EventBus should have its own semaphore to avoid reducing DAG write throughput. +- **Why:** Under load (10+ agents), EventBus publishes compete with DAG writes for the same semaphore slots. +- **Effort:** S | **Depends on:** Phase 3 (EventBus routing) shipped + +### Git worktree pooling for swarm-scale +- **What:** Pre-create a pool of git worktrees and assign them to agents on spawn, rather than creating worktrees on-demand. +- **Why:** `git worktree add` takes 200-500ms. At 50+ agents, sequential creation is 10-25 seconds. Pooling amortizes this. +- **Effort:** M | **Depends on:** Swarm runtime (prior CEO plan) + +### Provider.ts capability flag cleanup +- **What:** The 11 optional capability interfaces + type guards in `provider.ts` are an anti-pattern. The `AgentRuntime` interface provides a path to simplify: spawn/send/close replace ad-hoc claim/heartbeat/workspace methods. +- **Why:** New features currently require adding a new interface + type guard + conditional logic everywhere. Fragile and hard to trace. +- **Effort:** M | **Depends on:** Phase 2 (AgentRuntime) shipped + +### Incremental stop condition evaluation for swarm-scale +- **What:** `evaluateStopConditions()` in lifecycle.ts scans all contributions (O(n)). At 1000+ contributions, this adds 50-100ms per contribute. Maintain running state (current best score, contribution count, last improvement round) and update incrementally to O(1). +- **Why:** Swarm-scale (50+ agents) produces contributions rapidly. O(n) evaluation becomes a bottleneck. +- **Effort:** M | **Depends on:** Swarm runtime (prior CEO plan) + +## P3 — Edge cases and polish + +### Contract re-evaluation on mid-session change +- **What:** When GROVE.md changes mid-session (cherry-pick #6), optionally re-evaluate existing contributions against the new contract. Currently deferred — only detection + notification + diff ships in v2. +- **Why:** Semantically tricky: a previously accepted contribution might now violate the new contract. Need a policy for handling this (flag, reject, ignore). +- **Effort:** M | **Depends on:** Cherry-pick #6 (contract watcher) shipped + +### Empty summary validation +- **What:** `grove_contribute` with an empty string summary passes schema validation. Add minimum length check (e.g., 10 chars). +- **Why:** Empty summaries provide no value in the contribution DAG and make the TUI feed unreadable. +- **Effort:** S | **Depends on:** Phase 1 (enforcement pipeline) + +### Score tie policy +- **What:** Define outcome when a new contribution has the same score as the frontier best. Currently undefined — could be "unchanged", "tied", or treated as "improved" (same threshold met). +- **Why:** Tie-breaking affects frontier ranking and outcome derivation. Need a consistent policy. +- **Effort:** S | **Depends on:** Phase 1 (outcome derivation) + +### Create DESIGN.md +- **What:** Document Grove's design system: spacing scale, typography, component patterns, voice/tone. Currently only `theme.ts` exists with color tokens. +- **Why:** Web dashboard (12-month roadmap) needs a design system reference to avoid diverging from the TUI's established aesthetic. +- **Effort:** M (via /design-consultation) | **Depends on:** Nothing (can be done anytime) + +### Raise dimmed color to #777 for WCAG AA contrast +- **What:** Change `theme.dimmed` from `#666666` to `#777777` in `theme.ts`. Current value fails WCAG AA contrast (3.9:1 vs required 4.5:1). +- **Why:** Accessibility compliance. Minimal visual change — dimmed still looks dimmed. +- **Effort:** S (1 line change) | **Depends on:** Nothing + +### Contract watcher debounce tuning +- **What:** The contract file watcher (cherry-pick #6) needs a debounce interval to handle rapid saves (e.g., editor auto-save). Start with 1s, tune based on usage. +- **Why:** Without debounce, rapid GROVE.md edits trigger multiple diff/notification cycles. +- **Effort:** S | **Depends on:** Cherry-pick #6 shipped diff --git a/docs/designs/grove-v2-architecture.md b/docs/designs/grove-v2-architecture.md new file mode 100644 index 00000000..c2a4747f --- /dev/null +++ b/docs/designs/grove-v2-architecture.md @@ -0,0 +1,51 @@ +--- +status: ACTIVE +--- +# CEO Plan: Grove v2 — Contract Enforcement, Event-Driven Routing, acpx Agent Lifecycle + +Generated by /plan-ceo-review on 2026-03-20 +Branch: worktree-ticklish-coalescing-stonebraker | Mode: SELECTIVE EXPANSION +Repo: windoliver/grove + +## Vision + +### 10x Check +The 10x version of Grove v2 is not just "enforced contracts + better agent spawning" — it's a **self-healing coordination runtime** where the system automatically detects contract violations, suggests fixes to agents, re-routes work when agents crash, and evolves contracts mid-session based on observed patterns. The enforcement pipeline becomes an intelligent system, not just a validator. + +At 10x, Grove becomes the "Kubernetes for AI agents" — you declare the desired state (contract), deploy agents (topology), and the runtime reconciles reality to match. Agents crash? Auto-respawn. Gates too strict? System suggests relaxation. Stop condition met? Graceful shutdown with audit trail. + +### Delight Opportunities Surfaced +1. Structured rejection feedback (agent self-corrects on first try) +2. Contract validation CLI (catch errors at authoring time) +3. Dry-run mode for contribute (preview before commit) +4. Enforcement pipeline audit log (black box recorder) +5. Auto-reconnect on crash (resilient lifecycle) +6. Contract diff on mid-session update (live-tuning experiments) + +## Scope Decisions + +| # | Proposal | Effort | Decision | Reasoning | +|---|----------|--------|----------|-----------| +| 1 | Structured Rejection Feedback | S | ACCEPTED | Enforcement without feedback = frustrating enforcement. Agents need to self-correct. | +| 2 | Contract Validation CLI | S | ACCEPTED | Authoring-time errors >> runtime errors. Type checker for coordination protocol. | +| 3 | Dry-Run Mode for Contribute | S | ACCEPTED | Preview before commit. Agents can evaluate locally before contributing. | +| 4 | Enforcement Pipeline Audit Log | M | ACCEPTED | Observability is not optional. Black box recorder for trustworthy experiments. | +| 5 | Auto-Reconnect on Crash | M | ACCEPTED | Crashes are inevitable with 5+ agents. Resilient lifecycle is a differentiator. | +| 6 | Contract Diff on Mid-Session Update | M | ACCEPTED | Live-tuning experiments without restart. Researcher workflow improvement. | + +## Accepted Scope (added to base v2 plan) +- Structured rejection errors in enforcement pipeline (all validation steps return typed error objects) +- `grove contract validate` CLI command +- `grove_contribute --dry-run` parameter on MCP tool +- Audit log for every enforcement pipeline run (stored as metadata or system contributions) +- AgentRuntime auto-reconnect with circuit-breaker (max 3 retries, exponential backoff) +- Contract change detection + diff + agent notification mid-session + +## Deferred to TODOS.md +- (none — all proposals accepted) + +## Skipped +- (none) + +## Relationship to Prior Plans +- **Swarm Runtime** (2026-03-19, PROMOTED): The swarm design depends on v2's foundation. AgentRuntime interface enables swarm-scale spawning. Enforcement pipeline ensures swarm contributions are valid. Event routing enables swarm convergence signals.