Skip to content

feat: expose grove_eval on MCP, fix topology edge preservation, fix example YAML#242

Merged
windoliver merged 8 commits intomainfrom
worktree-quirky-moseying-whisper
Apr 12, 2026
Merged

feat: expose grove_eval on MCP, fix topology edge preservation, fix example YAML#242
windoliver merged 8 commits intomainfrom
worktree-quirky-moseying-whisper

Conversation

@windoliver
Copy link
Copy Markdown
Owner

Closes #230.

Summary

  • grove_eval MCP tool — Agents can now run the eval harness via MCP instead of shelling out to the CLI. Spawns evalCommand via sh -c, sets GROVE_TARGET_CID env var, streams output line-by-line, parses GROVE_SCORE metric=value lines, returns structured scores + exit metadata. Configurable timeout (default 5 min) + 16 MB output cap. Scoped via eval?: boolean in McpPresetConfig.

  • TopologyRouter preserves edge typesedgeMap changed from Map<string, string[]> to Map<string, RoleEdge[]>. Dedup now uses Set<string> keyed by "target:edgeType" (O(1) vs O(n) Array.includes). targetsFor() returns readonly RoleEdge[]. route() still deduplicates by target for event publishing. Note: edge type behavioral semantics remain flat/informational in this version — the structural change unblocks future behavioral routing.

  • Example YAML fixed + CI guardexamples/real-autoresearch/grove.md had 6 schema violations (4 from the issue + 2 additional: auto_evaluate/accept_if in outcome_policy, invalid enforcement block). All fixed. New glob-based test in src/core/examples.test.ts parses all examples/**/grove.md against parseGroveContract — auto-covers new examples, no manual registration needed.

Files changed

File Change
src/core/operations/eval.ts New: eval operation with spawn, streaming, timeout
src/mcp/tools/eval.ts New: MCP wrapper for grove_eval
src/mcp/tools/eval.test.ts New: 6 test cases
src/core/examples.test.ts New: glob CI guard for all example grove.md files
src/core/topology-router.ts RoleEdge[] edgeMap, Set dedup, updated targetsFor
src/core/operations/contribute.ts Extract .target from RoleEdge in routing
src/mcp/server.ts Add eval flag + registerEvalTools
src/mcp/server.test.ts Add grove_eval to allTools, allDisabled, eval:false test
src/mcp/server.integration.test.ts Update tool count 37→38
src/core/event-bus.test.ts Updated targetsFor assertions + 3 new edge-type tests
src/core/operations/plan.test.ts Updated mock return type
src/core/operations/index.ts Export eval types + function
examples/real-autoresearch/grove.md Fix all schema violations

Test plan

  • bun test src/core/event-bus.test.ts — 15 pass (topology router)
  • bun test src/core/examples.test.ts — 1 pass (example YAML parse guard)
  • bun test src/mcp/tools/eval.test.ts — 6 pass (eval operation)
  • bun test src/mcp/server.test.ts — 15 pass (preset scoping incl. eval:false)
  • bun test src/mcp/server.integration.test.ts — 10 pass (38 tools)
  • bun tsc --noEmit — passes

…ervation, fix example YAML drift

Implements 3 protocol gaps from issue #230:

1. grove_eval MCP tool (Issue 1):
   - Add src/core/operations/eval.ts: spawns evalCommand via sh -c with
     GROVE_TARGET_CID env var, streams stdout/stderr line-by-line, parses
     GROVE_SCORE metric=value lines, enforces configurable timeout (default
     5 min) and 16 MB output cap
   - Add src/mcp/tools/eval.ts: thin MCP wrapper with targetCid (required),
     evalCommand (required), timeoutMs (optional) inputs
   - Add eval?: boolean flag to McpPresetConfig (default true, backwards compat)
   - 6 test cases in eval.test.ts covering success, env var, validation errors,
     timeout, and non-zero exit code

2. TopologyRouter preserves EdgeType (Issue 2):
   - edgeMap now Map<string, RoleEdge[]> instead of Map<string, string[]>
   - Deduplication by (target, edgeType) pair using Set (O(1) vs O(n) Array.includes)
   - targetsFor() returns readonly RoleEdge[] — callers get full edge metadata
   - route() still deduplicates by target for event publishing (one event per role)
   - Update contribute.ts and plan.test.ts mock for new return type
   - 4 new/updated tests: targetsFor shape, multi-edge-type preservation,
     route() single-event guarantee, exact-duplicate dedup

3. Example YAML fixed + CI guard (Issue 3 + 4A):
   - Fix real-autoresearch/grove.md: no_improvement_rounds → max_rounds_without_improvement,
     threshold → value, wall_clock_budget → budget.max_wall_clock_seconds, add
     deliberation_limit.max_rounds; also fix auto_evaluate/accept_if →
     outcome_policy.auto_accept.metric_improves, remove invalid enforcement block
   - Add src/core/examples.test.ts: glob-based CI guard that parses all
     examples/**/grove.md against parseGroveContract — auto-covers new examples
Wraps the CommandPalette in a position="absolute" box with zIndex=10
so it renders on top of PanelManager without affecting the flex layout.
Previously rendered inline in the flex column causing garbled overlap.

Also fixes invalid background={true} prop → backgroundColor={theme.headerBg}.
…in palette

- Add bottom={2} to absolute palette box so backgroundColor fills the
  entire panel area instead of just the palette item height
- Show edge targets in spawn item detail: [0/1 → reviewer], [0/1 → coder]
  so the graph routing is visible without opening another panel
…doff targets

grove_eval executes sh -c with caller-supplied input. Exposing this on
the HTTP transport without mandatory auth is an RCE path. Now requires
both AUTH_TOKEN set and GROVE_MCP_EVAL_ENABLED=true to enable on HTTP.

Deduplicate routedTo targets before handoff creation: a role with multiple
edge types (delegates + feeds) to the same downstream produced duplicate
pending handoffs. Use Set dedup so one handoff is created per (source, target).
…process group on timeout

grove_eval was opt-out (default enabled) on both stdio and HTTP. Any connected
agent could run arbitrary shell. Now disabled on all transports; enable with
GROVE_MCP_EVAL_ENABLED=true (HTTP also requires AUTH_TOKEN).

Eval subprocess now uses detached:true so timeout sends SIGTERM/SIGKILL to the
full process group (-pid), not just the sh wrapper. Prevents orphan children
from outliving the timeout.
eval.ts: makeLineHandler now returns {handler, flush}; close handler calls
flush() on both stdout/stderr so a final GROVE_SCORE line without a trailing
newline is parsed instead of silently dropped.

server.ts: flip preset?.eval guard from != false to === true so grove_eval
is disabled by default even when createMcpServer is called with no preset
(embedder case). Update stale JSDoc comment + integration test to pass
{ eval: true } for the all-tools coverage test.
@windoliver windoliver force-pushed the worktree-quirky-moseying-whisper branch from ac25f55 to 331549b Compare April 12, 2026 07:14
Add biome-ignore suppressions for pre-existing noEmptyBlockStatements and
noNonNullAssertion warnings in files touched by this branch. Auto-fix
organizeImports in eval.ts, index.ts, and examples.test.ts.
grove_eval is now disabled by default (preset?.eval === true required).
Remove grove_eval from the default allTools list, update test descriptions,
and add an explicit eval:true test.
@windoliver windoliver merged commit f9f6789 into main Apr 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

protocol: expose evalOperation on MCP + topology edge types unused + example YAML drift

1 participant