feat: expose grove_eval on MCP, fix topology edge preservation, fix example YAML#242
Merged
windoliver merged 8 commits intomainfrom Apr 12, 2026
Merged
Conversation
…ervation, fix example YAML drift Implements 3 protocol gaps from issue #230: 1. grove_eval MCP tool (Issue 1): - Add src/core/operations/eval.ts: spawns evalCommand via sh -c with GROVE_TARGET_CID env var, streams stdout/stderr line-by-line, parses GROVE_SCORE metric=value lines, enforces configurable timeout (default 5 min) and 16 MB output cap - Add src/mcp/tools/eval.ts: thin MCP wrapper with targetCid (required), evalCommand (required), timeoutMs (optional) inputs - Add eval?: boolean flag to McpPresetConfig (default true, backwards compat) - 6 test cases in eval.test.ts covering success, env var, validation errors, timeout, and non-zero exit code 2. TopologyRouter preserves EdgeType (Issue 2): - edgeMap now Map<string, RoleEdge[]> instead of Map<string, string[]> - Deduplication by (target, edgeType) pair using Set (O(1) vs O(n) Array.includes) - targetsFor() returns readonly RoleEdge[] — callers get full edge metadata - route() still deduplicates by target for event publishing (one event per role) - Update contribute.ts and plan.test.ts mock for new return type - 4 new/updated tests: targetsFor shape, multi-edge-type preservation, route() single-event guarantee, exact-duplicate dedup 3. Example YAML fixed + CI guard (Issue 3 + 4A): - Fix real-autoresearch/grove.md: no_improvement_rounds → max_rounds_without_improvement, threshold → value, wall_clock_budget → budget.max_wall_clock_seconds, add deliberation_limit.max_rounds; also fix auto_evaluate/accept_if → outcome_policy.auto_accept.metric_improves, remove invalid enforcement block - Add src/core/examples.test.ts: glob-based CI guard that parses all examples/**/grove.md against parseGroveContract — auto-covers new examples
Wraps the CommandPalette in a position="absolute" box with zIndex=10
so it renders on top of PanelManager without affecting the flex layout.
Previously rendered inline in the flex column causing garbled overlap.
Also fixes invalid background={true} prop → backgroundColor={theme.headerBg}.
…in palette
- Add bottom={2} to absolute palette box so backgroundColor fills the
entire panel area instead of just the palette item height
- Show edge targets in spawn item detail: [0/1 → reviewer], [0/1 → coder]
so the graph routing is visible without opening another panel
…doff targets grove_eval executes sh -c with caller-supplied input. Exposing this on the HTTP transport without mandatory auth is an RCE path. Now requires both AUTH_TOKEN set and GROVE_MCP_EVAL_ENABLED=true to enable on HTTP. Deduplicate routedTo targets before handoff creation: a role with multiple edge types (delegates + feeds) to the same downstream produced duplicate pending handoffs. Use Set dedup so one handoff is created per (source, target).
…process group on timeout grove_eval was opt-out (default enabled) on both stdio and HTTP. Any connected agent could run arbitrary shell. Now disabled on all transports; enable with GROVE_MCP_EVAL_ENABLED=true (HTTP also requires AUTH_TOKEN). Eval subprocess now uses detached:true so timeout sends SIGTERM/SIGKILL to the full process group (-pid), not just the sh wrapper. Prevents orphan children from outliving the timeout.
eval.ts: makeLineHandler now returns {handler, flush}; close handler calls
flush() on both stdout/stderr so a final GROVE_SCORE line without a trailing
newline is parsed instead of silently dropped.
server.ts: flip preset?.eval guard from != false to === true so grove_eval
is disabled by default even when createMcpServer is called with no preset
(embedder case). Update stale JSDoc comment + integration test to pass
{ eval: true } for the all-tools coverage test.
ac25f55 to
331549b
Compare
Add biome-ignore suppressions for pre-existing noEmptyBlockStatements and noNonNullAssertion warnings in files touched by this branch. Auto-fix organizeImports in eval.ts, index.ts, and examples.test.ts.
grove_eval is now disabled by default (preset?.eval === true required). Remove grove_eval from the default allTools list, update test descriptions, and add an explicit eval:true test.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #230.
Summary
grove_evalMCP tool — Agents can now run the eval harness via MCP instead of shelling out to the CLI. SpawnsevalCommandviash -c, setsGROVE_TARGET_CIDenv var, streams output line-by-line, parsesGROVE_SCORE metric=valuelines, returns structured scores + exit metadata. Configurable timeout (default 5 min) + 16 MB output cap. Scoped viaeval?: booleaninMcpPresetConfig.TopologyRouter preserves edge types —
edgeMapchanged fromMap<string, string[]>toMap<string, RoleEdge[]>. Dedup now usesSet<string>keyed by"target:edgeType"(O(1) vs O(n)Array.includes).targetsFor()returnsreadonly RoleEdge[].route()still deduplicates by target for event publishing. Note: edge type behavioral semantics remain flat/informational in this version — the structural change unblocks future behavioral routing.Example YAML fixed + CI guard —
examples/real-autoresearch/grove.mdhad 6 schema violations (4 from the issue + 2 additional:auto_evaluate/accept_ifinoutcome_policy, invalidenforcementblock). All fixed. New glob-based test insrc/core/examples.test.tsparses allexamples/**/grove.mdagainstparseGroveContract— auto-covers new examples, no manual registration needed.Files changed
src/core/operations/eval.tssrc/mcp/tools/eval.tssrc/mcp/tools/eval.test.tssrc/core/examples.test.tssrc/core/topology-router.tssrc/core/operations/contribute.ts.targetfrom RoleEdge in routingsrc/mcp/server.tssrc/mcp/server.test.tssrc/mcp/server.integration.test.tssrc/core/event-bus.test.tssrc/core/operations/plan.test.tssrc/core/operations/index.tsexamples/real-autoresearch/grove.mdTest plan
bun test src/core/event-bus.test.ts— 15 pass (topology router)bun test src/core/examples.test.ts— 1 pass (example YAML parse guard)bun test src/mcp/tools/eval.test.ts— 6 pass (eval operation)bun test src/mcp/server.test.ts— 15 pass (preset scoping incl. eval:false)bun test src/mcp/server.integration.test.ts— 10 pass (38 tools)bun tsc --noEmit— passes