Skip to content

fix(workspace): enforce isolation policy, surface degraded mode in TUI#249

Open
windoliver wants to merge 5 commits intomainfrom
worktree-temporal-whistling-sundae
Open

fix(workspace): enforce isolation policy, surface degraded mode in TUI#249
windoliver wants to merge 5 commits intomainfrom
worktree-temporal-whistling-sundae

Conversation

@windoliver
Copy link
Copy Markdown
Owner

Fixes #208.

Summary

  • Workspace provisioning failures in SessionOrchestrator.spawnAgent() and SpawnManager.spawn() were silently swallowed — agents ran in the shared project root with no error surfaced
  • Replaced blanket catch with policy-aware WorkspaceIsolationPolicy (strict | allow-fallback) and WorkspaceMode tagged union (isolated_worktree | fallback_workspace | bootstrap_failed)
  • TUI spawn-progress screen now shows a [shared workspace] / [no config] badge + inline ⚠ <reason> for degraded agents

Changes

Core

  • workspace-provisioner: fully async via execFileAsync (no event-loop blocking, no shell injection); parallel cleanup with Promise.allSettled; WorkspaceMode + WorkspaceIsolationPolicy exported from core/index
  • session-orchestrator: new provisionAgentWorkspace() helper with exhaustive policy-aware control flow; workspaceMode on AgentSessionInfo; default policy strict
  • session-service: threads workspaceIsolationPolicy through to orchestrator

TUI

  • spawn-manager: replaces inline execSync git worktree add with provisionWorkspace(); setIsolationPolicy(); workspaceMode on SpawnResult; default allow-fallback (backward compat)
  • spawn-progress: [shared workspace] / [no config] badge + ⚠ reason line for degraded agents
  • screen-manager: threads workspaceMode from SpawnResult into AgentSpawnState

Tests

  • 7 new workspace-bootstrap unit tests
  • 5 new policy enforcement tests in session-orchestrator
  • 2 new policy enforcement tests in spawn-manager
  • All existing spawn-success tests retrofitted with workspaceMode assertions
  • 3 tmux capture-pane E2E tests verifying on-screen badge visibility:
    • allow-fallback + non-git dir → ● started [shared workspace] + ⚠ Command failed: git worktree add ...
    • strict + non-git dir → ✗ failed: Workspace provisioning failed for '...'
    • allow-fallback + real git repo → ● started (no badge — happy path)

Full suite: 4898 pass, 0 fail.

Test plan

  • Run bun test — 4898 pass
  • Run bun test tests/tui/workspace-isolation-e2e.test.ts — 3 pass (requires tmux)
  • Start TUI against a non-git directory with default policy — agents show [shared workspace] badge
  • Start TUI against a real grove with git — agents show plain ● started

Fixes #208. Workspace provisioning failures in
SessionOrchestrator and SpawnManager were silently swallowed, causing
agents to run in a shared project root with no operator visibility.

Changes:
- workspace-provisioner: fully async via execFileAsync (no event-loop
  block, no shell injection); parallel cleanup with Promise.allSettled
- WorkspaceMode tagged union (isolated_worktree | fallback_workspace |
  bootstrap_failed) + WorkspaceIsolationPolicy (strict | allow-fallback)
  exported from core/index
- session-orchestrator: provisionAgentWorkspace() helper with exhaustive
  policy-aware control flow; workspaceMode on AgentSessionInfo; default
  policy strict
- spawn-manager: provisionWorkspace() replaces inline execSync git
  worktree; setIsolationPolicy(); workspaceMode on SpawnResult; default
  policy allow-fallback (TUI backward compat)
- session-service: threads workspaceIsolationPolicy through to orchestrator
- spawn-progress: [shared workspace] / [no config] badge + ⚠ reason line
  for degraded agents; screen-manager threads workspaceMode from SpawnResult
- Tests: workspace-bootstrap unit tests (7), policy enforcement tests in
  orchestrator (5) and spawn-manager (2), retrofitted all existing spawn
  tests with workspaceMode assertions
- E2E: tmux capture-pane test (3 scenarios) verifies badge visible on
  screen for fallback_workspace, hard failure for strict, clean for
  isolated_worktree
delegates/feeds/escalates edges now cause the target role's worktree to
branch off the source role's grove branch instead of HEAD.

  coder → reviewer (delegates)
    coder  worktree: grove/<sessionId>/coder   (base: HEAD)
    reviewer worktree: grove/<sessionId>/reviewer (base: grove/<sessionId>/coder)

This gives reviewers a real git branch relationship — they can run
`git merge grove/<sessionId>/coder` to get the coder's latest commits
without touching the main checkout.

Other edge types (reports, requests, feedback) remain independent (HEAD).

Changes:
- topology: WORKSPACE_BRANCH_EDGES constant, resolveRoleWorkspaceStrategies(),
  topologicalSortRoles() — Kahn's algorithm ensures source branches exist
  before dependents try to base their worktrees on them
- session-orchestrator: provision workspaces in topological order then spawn
  in parallel; passes resolved baseBranch to provisionAgentWorkspace()
- grove-md-builder: annotates edge_type lines with workspace behaviour comment
  so GROVE.md is self-documenting
- sqlite: worktree_strategy_json column on sessions table (migration + DDL);
  stores resolved strategies at creation time for operator visibility
- session: worktreeStrategies field on Session type
- tests: 13 new unit tests for resolveRoleWorkspaceStrategies and
  topologicalSortRoles covering all 6 edge types + chain ordering
SpawnManager now uses resolveRoleWorkspaceStrategies() to pick the correct
baseBranch per role based on topology edge types. ScreenManager passes the
resolved topology to SpawnManager before spawning begins.

Also fixes the branch-naming mismatch: spawn() now uses wsSessionId
(stable session-level ID) for provisionWorkspace so the branch computed
by resolveRoleWorkspaceStrategies matches the branch actually created for
the source role.

Validated via 3 tmux capture-pane scenarios:
  1. delegates + real git → both isolated_worktree, no badge
  2. feedback  + real git → both isolated_worktree, independent
  3. delegates + no git   → both fallback_workspace, [shared workspace] badge
The TUI screen-manager fired all role spawns concurrently, racing
coder and reviewer worktree provisioning. With delegates edges the
reviewer needs coder's git branch to exist first — concurrent spawns
caused reviewer to fail with "git worktree add ... branch not found"
and fall back to shared workspace.

Fix: import topologicalSortRoles and spawn sequentially — source roles
complete before dependents start. Validated via real TUI E2E:
  1. grove up → New Session → review-loop → goal → launch
  2. Spawn-progress shows: coder spawning → coder started → reviewer spawning → reviewer started
  3. git worktree list confirms both worktrees on correct branches
  4. Reviewer branch based on coder branch (delegates edge)
SessionOrchestrator hardcoded mcpServePath as join(projectRoot, "src/mcp/serve.ts")
which only works when the project IS the grove repo. For any other project
(e.g. /tmp/foo), agents got a non-existent MCP server path and couldn't
call grove_submit_work, grove_submit_review, or grove_done.

Fix: extract resolveMcpServePath() into shared utility that derives the
path from process.argv[1] (the running grove CLI entry point), climbing
3 levels to find the grove installation directory. Falls back through
dist/mcp/serve.js → src/mcp/serve.ts → import.meta.url variants.

DRYs up SpawnManager which had the same logic inline in two places.

Validated E2E: real claude agent via acpx in /tmp test project
successfully called grove_submit_work — contribution CID stored in
session_contributions with the correct session link.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: do not silently fall back to shared workspace when agent worktree/bootstrap fails

1 participant