Skip to content

MCP tools: force structured input — artifacts, scores, relations #169

@windoliver

Description

@windoliver

Problem

Agents call grove_contribute with only summary text and skip all structured fields (artifacts, scores, relations). The frontier ranks by recency only. The DAG is flat — no edges.

Root Cause

grove_contribute accepts ALL kinds (work, review, discussion, etc.) with everything optional. LLMs only fill required fields. JSON Schema can't express "required if kind=review". So agents always take the lazy path: grove_contribute(kind="review", summary="LGTM") — skipping targetCid and scores.

grove_review exists as sugar that forces targetCid, but agents bypass it by calling grove_contribute(kind="review") directly.

You can't have a generic tool alongside specific tools. Agents will always pick the one with fewer required fields.

Fix: Replace grove_contribute with per-kind tools

Remove grove_contribute entirely. Each kind gets its own tool with strict required fields:

Tool Required Replaces
grove_submit_work summary, artifacts, agent grove_contribute(kind=work)
grove_submit_review targetCid, summary, scores, agent grove_review + grove_contribute(kind=review)
grove_discuss summary, agent grove_contribute(kind=discussion) + grove_send_message
grove_reproduce targetCid, result, summary, agent grove_contribute(kind=reproduction)
grove_done summary, agent grove_contribute(context.done=true)

No generic escape hatch. No bypass. 5 tools instead of 5 overlapping ones.

Server-side validation

Each tool returns isError: true with actionable message if required fields are wrong:

VALIDATION_ERROR: Reviews must include at least one score.
Example: scores: { "correctness": { value: 0.9, direction: "maximize" } }

Zod descriptions with examples

artifacts: z.record(z.string(), z.string())
  .describe("File artifacts. Map path to content. " +
    "Example: {\"hello.txt\": \"Hello World\"}. " +
    "Without artifacts, reviewers cannot see your files.")

Also merge

Merge Into Why
grove_send_message grove_discuss Both create discussion contributions. Use tags for @mentions.
grove_bounty_claim grove_claim bounty_claim calls claim internally. Agents just call claim.

Also remove

Remove Why
grove_contribute Generic escape hatch. Agents bypass structured fields. Replaced by per-kind tools above.
grove_review Replaced by grove_submit_review with scores REQUIRED (not optional).

E2E Testing Guide

How to test (MUST follow — no workarounds, no manual injection)

cd /Users/tafeng/grove/.claude/worktrees/snazzy-humming-thacker
bun run src/cli/main.ts tui
  1. j → "New session" → Enter
  2. Enter → select "review-loop"
  3. Enter → confirm name
  4. Wait for setup (~10s if Nexus running, ~60s cold start)
  5. Enter → continue past Agent Detect
  6. Type create hello.txtEnterEnter (confirm spawn)
  7. Wait — agents take 30-60s. Do NOT inject contributions manually.
  8. Watch feed for:
    • ▒ work — coder created file and called grove tool
    • ▷ review — reviewer responded
    • Loop continues until reviewer approves
    • grove_done signals completion
  9. Tab → check DAG, Detail, Frontier, Claims panels
  10. Ctrl+B → back to simple view
  11. q q → quit

What to verify

  • Feed shows contributions from BOTH coder and reviewer (not just coder)
  • New session feed is empty at start (no old contributions from previous sessions)
  • Contributions have artifacts (file paths/content) — not just summary text
  • Frontier shows real scores — not just "recency"
  • DAG shows edges between contributions — not a flat list
  • Reviewer calls grove_done after approving — loop ends cleanly (no infinite ping-pong)
  • Agent log files in .grove/agent-logs/ show grove_submit_work (completed) and grove_submit_review (completed)

tmux testing (for CI or automated verification)

tmux new-session -d -s grove-tui -x 120 -y 40
tmux send-keys -t grove-tui "cd $(pwd) && TERM=xterm-256color bun run src/cli/main.ts tui" Enter
sleep 10
# Navigate through screens
tmux send-keys -t grove-tui j Enter  # New session
sleep 2
tmux send-keys -t grove-tui Enter    # review-loop
sleep 1  
tmux send-keys -t grove-tui Enter    # confirm name
sleep 15                               # wait for setup
tmux send-keys -t grove-tui Enter    # past agent detect
sleep 1
tmux send-keys -t grove-tui "create hello.txt" Enter  # goal
sleep 1
tmux send-keys -t grove-tui Enter    # confirm spawn

# Poll for contributions (30s poll interval, agents take 30-60s)
for i in $(seq 1 20); do
  sleep 15
  FEED=$(tmux capture-pane -t grove-tui -p | grep -E "▒|▷|◇")
  COUNT=$(echo "$FEED" | grep -c "▒\|▷\|◇")
  echo "=== $i: $COUNT contributions ==="
  [ -n "$FEED" ] && echo "$FEED"
  # Check for grove_done
  DONE=$(cat .grove/agent-logs/reviewer-*.log 2>/dev/null | grep -c "grove_done\|DONE")
  [ "$DONE" -gt 0 ] && echo "SESSION COMPLETE" && break
done

Known architecture decisions

  • Local SQLite for data — MCP and HTTP server both read/write .grove/grove.db. NOT Nexus VFS (rate limited).
  • Nexus for IPC only — SSE push between agents via NexusWsBridge. Not for contribution storage.
  • Shared Nexus instance — All worktrees share ~/.grove/nexus-data (same Docker stack). API key read from ~/.grove/nexus-data/.state.json.
  • 30s polling baselineusePolledData at 30s interval. EventBus (when SSE connected) triggers instant refresh(). No "safety net" polling inside event-driven hook.
  • codex MCP "Unsupported" is normal — stdio transport always shows this. Tools work fine.
  • Single grove MCP name — registered via codex mcp add grove. No per-spawn unique names (avoids 93+ stale entries).
  • findGroveDir prefers CWD — checks $CWD/.grove/ before walking up parent dirs.

Common pitfalls (from debugging this session)

  1. Don't inject contributions manually (bun -e). The whole point is agents call grove tools themselves.
  2. Don't unset GROVE_NEXUS_URL to bypass Nexus. Fix the key lookup instead.
  3. Stale server on port 4515 — if server was spawned by a previous TUI run, it has old env vars. Kill it: lsof -ti:4515 | xargs kill -9
  4. 93 stale codex MCP entries — clean with: codex mcp list | grep grove- | awk '{print $1}' | while read n; do codex mcp remove "$n"; done
  5. Multiple Nexus Docker stacks — check docker ps --filter name=nexus. Should be ONE stack. Kill extras: docker compose -p nexus-HASH down

Repro Steps (current broken state)

  1. bun run src/cli/main.ts tui
  2. New session → review-loop → "create hello.txt" → spawn agents
  3. Watch: coder sends grove_contribute(kind="work", summary="...") — no artifacts
  4. Watch: reviewer sends grove_contribute(kind="review", summary="LGTM") — no targetCid, no scores
  5. Tab → Frontier: only "recency" metric
  6. Tab → DAG: flat list, no edges

Expected After Fix

  • Coder calls grove_submit_work(summary, artifacts={...}) — reviewer can checkout files
  • Reviewer calls grove_submit_review(targetCid, summary, scores={...}) — DAG shows edges, frontier ranks by scores
  • No way to bypass required fields — tools enforce structure

Files to Change

  • src/mcp/tools/contributions.ts — remove grove_contribute, grove_review. Add grove_submit_work, grove_submit_review.
  • src/mcp/schemas.ts — update schemas with required fields
  • src/mcp/tools/messaging.ts — merge send_message into discuss
  • src/mcp/tools/done.ts — keep grove_done as-is (already has required summary + agent)
  • src/mcp/server.ts — update tool registration
  • src/tui/spawn-manager.ts — update CLAUDE.md/CODEX.md agent instructions to reference new tools
  • src/core/acpx-runtime.ts — update system-reminder to reference new tools
  • src/core/operations/contribute.ts — keep contributeOperation as internal (tools call it, agents don't)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions