Skip to content

Add AgentInvoker seam and e2e tests for all init templates#460

Open
mattpocock wants to merge 9 commits intomainfrom
implement/init-template-e2e-tests
Open

Add AgentInvoker seam and e2e tests for all init templates#460
mattpocock wants to merge 9 commits intomainfrom
implement/init-template-e2e-tests

Conversation

@mattpocock
Copy link
Copy Markdown
Owner

Implements PRD #447: end-to-end tests for init templates via an AgentInvoker seam.

Introduces three new deep modules (AgentInvoker Tag, PromptPreprocessor Tag, LocalSandboxFactory) and an internal test-support entry point (runForTest). Then adds e2e tests for each init template — blank, simple-loop, sequential-reviewer, parallel-planner, parallel-planner-with-review — parameterised across all valid (agent, backlog-manager) combinations.

Tests scaffold each template into a tmp dir, execute the generated main.mts with @ai-hero/sandcastle aliased to the internal testing entry via Vitest module aliasing, and assert recorded agent invocations against expected prompt file, prompt arguments, agent provider, model, branch strategy, and call count. No Docker, no real agent, no network.

Closes #447, #448, #449, #450, #451, #452.

Closes #448
Closes #449
Closes #450
Closes #451
Closes #452

mattpocock and others added 2 commits April 24, 2026 14:55
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam

Key decisions:
- AgentInvoker extracted as Effect Context.Tag from Orchestrator's inline
  invokeAgent function. Production layer preserves exact existing behaviour.
- PromptPreprocessor lifted behind a Context.Tag. Production layer delegates
  to existing preprocessPrompt function unchanged.
- LocalSandboxFactory provisions a tmp dir with git init, isolated
  GIT_CONFIG_GLOBAL, and initial commit. Honours head, merge-to-head, and
  branch strategies via real git worktrees. No Docker.
- Internal testSupport.ts module exports runForTest (re-exported as run),
  recording agent-invoker layer, identity prompt-preprocessor layer, and
  recorder accessor. Not a published subpath.
- Vitest resolve.alias maps @ai-hero/sandcastle to testSupport.ts so
  generated main.mts files run unchanged via dynamic import.
- Recording invoker extracts model from provider.buildPrintCommand() to
  avoid requiring a model field on the AgentProvider interface.
- CONTEXT.md updated with "agent invoker" term under Execution.

Files changed:
- src/AgentInvoker.ts (new) — Tag + production layer
- src/PromptPreprocessorTag.ts (new) — Tag + production layer
- src/LocalSandboxFactory.ts (new) — test SandboxFactory layer
- src/testSupport.ts (new) — internal test-support module
- src/initTemplateE2e.test.ts (new) — e2e test for blank template
- src/Orchestrator.ts — uses AgentInvoker + PromptPreprocessor Tags
- src/Orchestrator.test.ts — provides production layers in test stack
- src/run.ts — provides production layers
- src/createSandbox.ts — provides production layers
- src/createWorktree.ts — provides production layers
- vitest.config.ts — resolve aliases for module aliasing
- CONTEXT.md — agent invoker term

No blockers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused NodeFileSystem import from testSupport.ts
- Remove orphaned JSDoc comment detached from makeRecordingAgentInvokerLayer
- Replace inline import() type with proper top-level import for SandboxInfo
- Remove unused branch from destructuring in LocalSandboxFactory Use callback

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 24, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

1 Skipped Deployment
Project Deployment Actions Updated (UTC)
sandcastle Ignored Ignored Apr 24, 2026 7:53pm

mattpocock and others added 2 commits April 24, 2026 18:52
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam

Key decisions:
- Parameterised across all 8 (agent × backlog-manager) combinations using
  describe.each: claude-code, pi, codex, opencode × github-issues, beads.
- Each case scaffolds simple-loop, dynamically imports main.mts via the
  testSupport alias, and asserts recorded invocations.
- Asserts iterate-until-COMPLETE wiring: only 1 invocation because the
  recording invoker emits the completion signal on iteration 1.
- Asserts recorded prompt contains backlog-manager shell expressions
  (gh issue list / gh issue close for github-issues; bd ready / bd close
  for beads) pre-expansion.
- Asserts agent provider, model (via defaultModel), branchStrategy
  (merge-to-head), and maxIterations (3) match the template.

Bug fixes required to make the test pass:
- Changed {{TASK_ID}} to $TASK_ID in backlog-manager CLOSE_TASK_COMMAND
  and VIEW_TASK_COMMAND — the double-curly syntax clashed with
  substitutePromptArgs, causing PromptError at runtime for templates
  that don't pass TASK_ID as a promptArg (simple-loop, sequential-reviewer,
  parallel-planner merge phase).
- Extended extractModelFromProvider regex to also match -m (short flag)
  used by codex's buildPrintCommand.
- Seeded package.json in LocalSandboxFactory so onSandboxReady hooks
  (npm install) don't fail in the bare test repo.

Files changed:
- src/initTemplateE2e.test.ts — added simple-loop describe block
- src/InitService.ts — {{TASK_ID}} → $TASK_ID in template args
- src/testSupport.ts — extractModelFromProvider handles -m flag
- src/LocalSandboxFactory.ts — seed package.json in test repo

No blockers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Scope `branch` to its if/else branches instead of leaking it to the
  acquire tuple where no consumer reads it
- Remove empty head-mode if branch (no-op comment)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattpocock mattpocock changed the base branch from implement/template-prompt-fixes to main April 24, 2026 18:58
mattpocock and others added 5 commits April 24, 2026 19:13
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam

Key decisions:
- Parameterised across all 8 (agent × backlog-manager) combinations using
  describe.each: claude-code, pi, codex, opencode × github-issues, beads.
- Each case scaffolds sequential-reviewer, patches MAX_ITERATIONS to 1 so
  only one implement→review cycle runs, then dynamically imports main.mts.
- Asserts exactly 2 recorded invocations in correct order: implement first
  (merge-to-head, maxIterations 100, name "implementer"), then review
  (branch strategy, maxIterations 1, name "reviewer").
- Asserts implement prompt matches scaffolded implement-prompt.md content
  and contains backlog-manager shell expressions (gh issue list / bd ready).
- Asserts review prompt matches scaffolded review-prompt.md with {{BRANCH}}
  substituted to "main". Asserts promptArgs: { BRANCH: "main" }.
- Asserts agent provider and model (via defaultModel) match scaffold-time
  choice for both invocations.

Infrastructure changes to support the test:
- LocalSandboxFactory: when branch strategy requests a branch that is
  already the current branch, skip worktree creation (avoids git error
  when the sequential-reviewer's review phase targets "main").
- testSupport runForTest: inject a synthetic commit when result.commits
  is empty so templates that guard on commits.length (sequential-reviewer's
  "skip review if no commits") can proceed past the guard.

Files changed:
- src/initTemplateE2e.test.ts — added sequential-reviewer describe block
- src/LocalSandboxFactory.ts — handle branch-is-current-branch case
- src/testSupport.ts — synthetic commit injection in runForTest

No blockers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ition

- Hoist agents, backlogManagers, combinations, and shellExpressionsByBm
  to the parent describe scope since they are identical across simple-loop
  and sequential-reviewer template tests
- Invert condition in LocalSandboxFactory branch strategy to eliminate
  empty if-branch and reduce nesting

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam

Key decisions:
- Parameterised across all 8 (agent × backlog-manager) combinations using
  describe.each: claude-code, pi, codex, opencode × github-issues, beads.
- Each case scaffolds parallel-planner, patches MAX_ITERATIONS to 1 so
  only one plan→execute→merge cycle runs, then dynamically imports main.mts.
- Configures planner response via setStdoutByRunName to return a valid
  <plan> JSON with one issue, exercising all three phases.
- Asserts exactly 3 recorded invocations in correct order: plan first
  (head strategy, maxIterations 1, name "planner"), then implement
  (branch strategy, maxIterations 100, name "implementer"), then merge
  (head strategy, maxIterations 1, name "merger").
- Asserts plan prompt matches scaffolded plan-prompt.md content.
- Asserts implement prompt matches scaffolded implement-prompt.md with
  {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data.
  Asserts promptArgs: { TASK_ID, ISSUE_TITLE, BRANCH }.
- Asserts merge prompt matches scaffolded merge-prompt.md with
  {{BRANCHES}} and {{ISSUES}} substituted. Asserts promptArgs match.
- Asserts all backlog-manager shell expressions (gh issue list/close,
  bd ready/close) appear across the three prompt files pre-expansion.
- Asserts agent provider and model (via defaultModel) match scaffold-time
  choice for all three invocations.

Infrastructure changes to support the test:
- testSupport: added setStdoutByRunName for per-runName response overrides
  in the recording agent invoker. The planner phase requires a valid <plan>
  response; without this, the template throws on missing plan tag.

Files changed:
- src/initTemplateE2e.test.ts — added parallel-planner describe block
- src/testSupport.ts — setStdoutByRunName + invoker override logic

No blockers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam

Key decisions:
- Parameterised across all 8 (agent × backlog-manager) combinations using
  describe.each: claude-code, pi, codex, opencode × github-issues, beads.
- Each case scaffolds parallel-planner-with-review, patches MAX_ITERATIONS
  to 1 so only one plan→implement→review→merge cycle runs, then dynamically
  imports main.mts.
- Configures planner response via setStdoutByRunName to return a valid
  <plan> JSON with one issue, exercising all four phases.
- Asserts exactly 4 recorded invocations in correct order: plan first
  (head strategy, maxIterations 1, name "planner"), then implement
  (branch strategy, maxIterations 100, name "implementer"), then review
  (branch strategy, maxIterations 1, name "reviewer"), then merge
  (head strategy, maxIterations 1, name "merger").
- Asserts plan prompt matches scaffolded plan-prompt.md content.
- Asserts implement prompt matches scaffolded implement-prompt.md with
  {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data.
- Asserts review prompt matches scaffolded review-prompt.md with
  {{BRANCH}} substituted from plan issue branch.
- Asserts merge prompt matches scaffolded merge-prompt.md with
  {{BRANCHES}} and {{ISSUES}} substituted.
- Asserts promptArgs per phase match expected values.
- Asserts all backlog-manager shell expressions (gh issue list/close,
  bd ready/close) appear across the four prompt files pre-expansion.
- Asserts agent provider and model (via defaultModel) match scaffold-time
  choice for all four invocations.
- Hoisted planIssue and planResponse constants to parent scope, shared
  with parallel-planner tests.

Infrastructure changes to support the test:
- testSupport: added createSandboxForTest that delegates sandbox.run()
  calls to runForTest so templates using createSandbox() record invocations
  through the same recording agent invoker. The production createSandbox
  uses ProductionAgentInvokerLayer which bypasses recording; the test
  version ensures all invocations are captured.

Files changed:
- src/initTemplateE2e.test.ts — added parallel-planner-with-review describe
  block, hoisted plan constants
- src/testSupport.ts — createSandboxForTest + re-export as createSandbox

No blockers.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove verbose per-assertion comments (e.g. "Agent provider matches
  the --agent choice") from simple-loop, sequential-reviewer, and
  parallel-planner tests to match the compact assertion style already
  used in the parallel-planner-with-review test
- Remove stray blank line in parallel-planner describe block
- Keep section headers (Phase 1/2/3) for navigation in long tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mattpocock mattpocock marked this pull request as ready for review April 24, 2026 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant