Add AgentInvoker seam and e2e tests for all init templates by mattpocock · Pull Request #460 · mattpocock/sandcastle

mattpocock · 2026-04-24T16:18:34Z

Implements PRD #447: end-to-end tests for init templates via an AgentInvoker seam.

Introduces three new deep modules (AgentInvoker Tag, PromptPreprocessor Tag, LocalSandboxFactory) and an internal test-support entry point (runForTest). Then adds e2e tests for each init template — blank, simple-loop, sequential-reviewer, parallel-planner, parallel-planner-with-review — parameterised across all valid (agent, backlog-manager) combinations.

Tests scaffold each template into a tmp dir, execute the generated main.mts with @ai-hero/sandcastle aliased to the internal testing entry via Vitest module aliasing, and assert recorded agent invocations against expected prompt file, prompt arguments, agent provider, model, branch strategy, and call count. No Docker, no real agent, no network.

Closes #447, #448, #449, #450, #451, #452.

Closes #448
Closes #449
Closes #450
Closes #451
Closes #452

Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - AgentInvoker extracted as Effect Context.Tag from Orchestrator's inline invokeAgent function. Production layer preserves exact existing behaviour. - PromptPreprocessor lifted behind a Context.Tag. Production layer delegates to existing preprocessPrompt function unchanged. - LocalSandboxFactory provisions a tmp dir with git init, isolated GIT_CONFIG_GLOBAL, and initial commit. Honours head, merge-to-head, and branch strategies via real git worktrees. No Docker. - Internal testSupport.ts module exports runForTest (re-exported as run), recording agent-invoker layer, identity prompt-preprocessor layer, and recorder accessor. Not a published subpath. - Vitest resolve.alias maps @ai-hero/sandcastle to testSupport.ts so generated main.mts files run unchanged via dynamic import. - Recording invoker extracts model from provider.buildPrintCommand() to avoid requiring a model field on the AgentProvider interface. - CONTEXT.md updated with "agent invoker" term under Execution. Files changed: - src/AgentInvoker.ts (new) — Tag + production layer - src/PromptPreprocessorTag.ts (new) — Tag + production layer - src/LocalSandboxFactory.ts (new) — test SandboxFactory layer - src/testSupport.ts (new) — internal test-support module - src/initTemplateE2e.test.ts (new) — e2e test for blank template - src/Orchestrator.ts — uses AgentInvoker + PromptPreprocessor Tags - src/Orchestrator.test.ts — provides production layers in test stack - src/run.ts — provides production layers - src/createSandbox.ts — provides production layers - src/createWorktree.ts — provides production layers - vitest.config.ts — resolve aliases for module aliasing - CONTEXT.md — agent invoker term No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove unused NodeFileSystem import from testSupport.ts - Remove orphaned JSDoc comment detached from makeRecordingAgentInvokerLayer - Replace inline import() type with proper top-level import for SandboxInfo - Remove unused branch from destructuring in LocalSandboxFactory Use callback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

vercel · 2026-04-24T16:18:39Z

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment

Project	Deployment	Actions	Updated (UTC)
sandcastle	Ignored		Apr 24, 2026 7:53pm

Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds simple-loop, dynamically imports main.mts via the testSupport alias, and asserts recorded invocations. - Asserts iterate-until-COMPLETE wiring: only 1 invocation because the recording invoker emits the completion signal on iteration 1. - Asserts recorded prompt contains backlog-manager shell expressions (gh issue list / gh issue close for github-issues; bd ready / bd close for beads) pre-expansion. - Asserts agent provider, model (via defaultModel), branchStrategy (merge-to-head), and maxIterations (3) match the template. Bug fixes required to make the test pass: - Changed {{TASK_ID}} to $TASK_ID in backlog-manager CLOSE_TASK_COMMAND and VIEW_TASK_COMMAND — the double-curly syntax clashed with substitutePromptArgs, causing PromptError at runtime for templates that don't pass TASK_ID as a promptArg (simple-loop, sequential-reviewer, parallel-planner merge phase). - Extended extractModelFromProvider regex to also match -m (short flag) used by codex's buildPrintCommand. - Seeded package.json in LocalSandboxFactory so onSandboxReady hooks (npm install) don't fail in the bare test repo. Files changed: - src/initTemplateE2e.test.ts — added simple-loop describe block - src/InitService.ts — {{TASK_ID}} → $TASK_ID in template args - src/testSupport.ts — extractModelFromProvider handles -m flag - src/LocalSandboxFactory.ts — seed package.json in test repo No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Scope `branch` to its if/else branches instead of leaking it to the acquire tuple where no consumer reads it - Remove empty head-mode if branch (no-op comment) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds sequential-reviewer, patches MAX_ITERATIONS to 1 so only one implement→review cycle runs, then dynamically imports main.mts. - Asserts exactly 2 recorded invocations in correct order: implement first (merge-to-head, maxIterations 100, name "implementer"), then review (branch strategy, maxIterations 1, name "reviewer"). - Asserts implement prompt matches scaffolded implement-prompt.md content and contains backlog-manager shell expressions (gh issue list / bd ready). - Asserts review prompt matches scaffolded review-prompt.md with {{BRANCH}} substituted to "main". Asserts promptArgs: { BRANCH: "main" }. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for both invocations. Infrastructure changes to support the test: - LocalSandboxFactory: when branch strategy requests a branch that is already the current branch, skip worktree creation (avoids git error when the sequential-reviewer's review phase targets "main"). - testSupport runForTest: inject a synthetic commit when result.commits is empty so templates that guard on commits.length (sequential-reviewer's "skip review if no commits") can proceed past the guard. Files changed: - src/initTemplateE2e.test.ts — added sequential-reviewer describe block - src/LocalSandboxFactory.ts — handle branch-is-current-branch case - src/testSupport.ts — synthetic commit injection in runForTest No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ition - Hoist agents, backlogManagers, combinations, and shellExpressionsByBm to the parent describe scope since they are identical across simple-loop and sequential-reviewer template tests - Invert condition in LocalSandboxFactory branch strategy to eliminate empty if-branch and reduce nesting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds parallel-planner, patches MAX_ITERATIONS to 1 so only one plan→execute→merge cycle runs, then dynamically imports main.mts. - Configures planner response via setStdoutByRunName to return a valid <plan> JSON with one issue, exercising all three phases. - Asserts exactly 3 recorded invocations in correct order: plan first (head strategy, maxIterations 1, name "planner"), then implement (branch strategy, maxIterations 100, name "implementer"), then merge (head strategy, maxIterations 1, name "merger"). - Asserts plan prompt matches scaffolded plan-prompt.md content. - Asserts implement prompt matches scaffolded implement-prompt.md with {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data. Asserts promptArgs: { TASK_ID, ISSUE_TITLE, BRANCH }. - Asserts merge prompt matches scaffolded merge-prompt.md with {{BRANCHES}} and {{ISSUES}} substituted. Asserts promptArgs match. - Asserts all backlog-manager shell expressions (gh issue list/close, bd ready/close) appear across the three prompt files pre-expansion. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for all three invocations. Infrastructure changes to support the test: - testSupport: added setStdoutByRunName for per-runName response overrides in the recording agent invoker. The planner phase requires a valid <plan> response; without this, the template throws on missing plan tag. Files changed: - src/initTemplateE2e.test.ts — added parallel-planner describe block - src/testSupport.ts — setStdoutByRunName + invoker override logic No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds parallel-planner-with-review, patches MAX_ITERATIONS to 1 so only one plan→implement→review→merge cycle runs, then dynamically imports main.mts. - Configures planner response via setStdoutByRunName to return a valid <plan> JSON with one issue, exercising all four phases. - Asserts exactly 4 recorded invocations in correct order: plan first (head strategy, maxIterations 1, name "planner"), then implement (branch strategy, maxIterations 100, name "implementer"), then review (branch strategy, maxIterations 1, name "reviewer"), then merge (head strategy, maxIterations 1, name "merger"). - Asserts plan prompt matches scaffolded plan-prompt.md content. - Asserts implement prompt matches scaffolded implement-prompt.md with {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data. - Asserts review prompt matches scaffolded review-prompt.md with {{BRANCH}} substituted from plan issue branch. - Asserts merge prompt matches scaffolded merge-prompt.md with {{BRANCHES}} and {{ISSUES}} substituted. - Asserts promptArgs per phase match expected values. - Asserts all backlog-manager shell expressions (gh issue list/close, bd ready/close) appear across the four prompt files pre-expansion. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for all four invocations. - Hoisted planIssue and planResponse constants to parent scope, shared with parallel-planner tests. Infrastructure changes to support the test: - testSupport: added createSandboxForTest that delegates sandbox.run() calls to runForTest so templates using createSandbox() record invocations through the same recording agent invoker. The production createSandbox uses ProductionAgentInvokerLayer which bypasses recording; the test version ensures all invocations are captured. Files changed: - src/initTemplateE2e.test.ts — added parallel-planner-with-review describe block, hoisted plan constants - src/testSupport.ts — createSandboxForTest + re-export as createSandbox No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Remove verbose per-assertion comments (e.g. "Agent provider matches the --agent choice") from simple-loop, sequential-reviewer, and parallel-planner tests to match the compact assertion style already used in the parallel-planner-with-review test - Remove stray blank line in parallel-planner describe block - Keep section headers (Phase 1/2/3) for navigation in long tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mattpocock and others added 2 commits April 24, 2026 14:55

mattpocock and others added 2 commits April 24, 2026 18:52

mattpocock changed the base branch from implement/template-prompt-fixes to main April 24, 2026 18:58

mattpocock and others added 5 commits April 24, 2026 19:13

mattpocock marked this pull request as ready for review April 24, 2026 19:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AgentInvoker seam and e2e tests for all init templates#460

Add AgentInvoker seam and e2e tests for all init templates#460
mattpocock wants to merge 9 commits intomainfrom
implement/init-template-e2e-tests

mattpocock commented Apr 24, 2026

Uh oh!

vercel Bot commented Apr 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mattpocock commented Apr 24, 2026

Uh oh!

vercel Bot commented Apr 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel Bot commented Apr 24, 2026 •

edited

Loading