Add AgentInvoker seam and e2e tests for all init templates#460
Open
mattpocock wants to merge 9 commits intomainfrom
Open
Add AgentInvoker seam and e2e tests for all init templates#460mattpocock wants to merge 9 commits intomainfrom
mattpocock wants to merge 9 commits intomainfrom
Conversation
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - AgentInvoker extracted as Effect Context.Tag from Orchestrator's inline invokeAgent function. Production layer preserves exact existing behaviour. - PromptPreprocessor lifted behind a Context.Tag. Production layer delegates to existing preprocessPrompt function unchanged. - LocalSandboxFactory provisions a tmp dir with git init, isolated GIT_CONFIG_GLOBAL, and initial commit. Honours head, merge-to-head, and branch strategies via real git worktrees. No Docker. - Internal testSupport.ts module exports runForTest (re-exported as run), recording agent-invoker layer, identity prompt-preprocessor layer, and recorder accessor. Not a published subpath. - Vitest resolve.alias maps @ai-hero/sandcastle to testSupport.ts so generated main.mts files run unchanged via dynamic import. - Recording invoker extracts model from provider.buildPrintCommand() to avoid requiring a model field on the AgentProvider interface. - CONTEXT.md updated with "agent invoker" term under Execution. Files changed: - src/AgentInvoker.ts (new) — Tag + production layer - src/PromptPreprocessorTag.ts (new) — Tag + production layer - src/LocalSandboxFactory.ts (new) — test SandboxFactory layer - src/testSupport.ts (new) — internal test-support module - src/initTemplateE2e.test.ts (new) — e2e test for blank template - src/Orchestrator.ts — uses AgentInvoker + PromptPreprocessor Tags - src/Orchestrator.test.ts — provides production layers in test stack - src/run.ts — provides production layers - src/createSandbox.ts — provides production layers - src/createWorktree.ts — provides production layers - vitest.config.ts — resolve aliases for module aliasing - CONTEXT.md — agent invoker term No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove unused NodeFileSystem import from testSupport.ts - Remove orphaned JSDoc comment detached from makeRecordingAgentInvokerLayer - Replace inline import() type with proper top-level import for SandboxInfo - Remove unused branch from destructuring in LocalSandboxFactory Use callback Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds simple-loop, dynamically imports main.mts via the testSupport alias, and asserts recorded invocations. - Asserts iterate-until-COMPLETE wiring: only 1 invocation because the recording invoker emits the completion signal on iteration 1. - Asserts recorded prompt contains backlog-manager shell expressions (gh issue list / gh issue close for github-issues; bd ready / bd close for beads) pre-expansion. - Asserts agent provider, model (via defaultModel), branchStrategy (merge-to-head), and maxIterations (3) match the template. Bug fixes required to make the test pass: - Changed {{TASK_ID}} to $TASK_ID in backlog-manager CLOSE_TASK_COMMAND and VIEW_TASK_COMMAND — the double-curly syntax clashed with substitutePromptArgs, causing PromptError at runtime for templates that don't pass TASK_ID as a promptArg (simple-loop, sequential-reviewer, parallel-planner merge phase). - Extended extractModelFromProvider regex to also match -m (short flag) used by codex's buildPrintCommand. - Seeded package.json in LocalSandboxFactory so onSandboxReady hooks (npm install) don't fail in the bare test repo. Files changed: - src/initTemplateE2e.test.ts — added simple-loop describe block - src/InitService.ts — {{TASK_ID}} → $TASK_ID in template args - src/testSupport.ts — extractModelFromProvider handles -m flag - src/LocalSandboxFactory.ts — seed package.json in test repo No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Scope `branch` to its if/else branches instead of leaking it to the acquire tuple where no consumer reads it - Remove empty head-mode if branch (no-op comment) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds sequential-reviewer, patches MAX_ITERATIONS to 1 so only one implement→review cycle runs, then dynamically imports main.mts. - Asserts exactly 2 recorded invocations in correct order: implement first (merge-to-head, maxIterations 100, name "implementer"), then review (branch strategy, maxIterations 1, name "reviewer"). - Asserts implement prompt matches scaffolded implement-prompt.md content and contains backlog-manager shell expressions (gh issue list / bd ready). - Asserts review prompt matches scaffolded review-prompt.md with {{BRANCH}} substituted to "main". Asserts promptArgs: { BRANCH: "main" }. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for both invocations. Infrastructure changes to support the test: - LocalSandboxFactory: when branch strategy requests a branch that is already the current branch, skip worktree creation (avoids git error when the sequential-reviewer's review phase targets "main"). - testSupport runForTest: inject a synthetic commit when result.commits is empty so templates that guard on commits.length (sequential-reviewer's "skip review if no commits") can proceed past the guard. Files changed: - src/initTemplateE2e.test.ts — added sequential-reviewer describe block - src/LocalSandboxFactory.ts — handle branch-is-current-branch case - src/testSupport.ts — synthetic commit injection in runForTest No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ition - Hoist agents, backlogManagers, combinations, and shellExpressionsByBm to the parent describe scope since they are identical across simple-loop and sequential-reviewer template tests - Invert condition in LocalSandboxFactory branch strategy to eliminate empty if-branch and reduce nesting Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds parallel-planner, patches MAX_ITERATIONS to 1 so only one plan→execute→merge cycle runs, then dynamically imports main.mts. - Configures planner response via setStdoutByRunName to return a valid <plan> JSON with one issue, exercising all three phases. - Asserts exactly 3 recorded invocations in correct order: plan first (head strategy, maxIterations 1, name "planner"), then implement (branch strategy, maxIterations 100, name "implementer"), then merge (head strategy, maxIterations 1, name "merger"). - Asserts plan prompt matches scaffolded plan-prompt.md content. - Asserts implement prompt matches scaffolded implement-prompt.md with {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data. Asserts promptArgs: { TASK_ID, ISSUE_TITLE, BRANCH }. - Asserts merge prompt matches scaffolded merge-prompt.md with {{BRANCHES}} and {{ISSUES}} substituted. Asserts promptArgs match. - Asserts all backlog-manager shell expressions (gh issue list/close, bd ready/close) appear across the three prompt files pre-expansion. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for all three invocations. Infrastructure changes to support the test: - testSupport: added setStdoutByRunName for per-runName response overrides in the recording agent invoker. The planner phase requires a valid <plan> response; without this, the template throws on missing plan tag. Files changed: - src/initTemplateE2e.test.ts — added parallel-planner describe block - src/testSupport.ts — setStdoutByRunName + invoker override logic No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Ref: PRD #447 — init-template end-to-end tests via AgentInvoker seam Key decisions: - Parameterised across all 8 (agent × backlog-manager) combinations using describe.each: claude-code, pi, codex, opencode × github-issues, beads. - Each case scaffolds parallel-planner-with-review, patches MAX_ITERATIONS to 1 so only one plan→implement→review→merge cycle runs, then dynamically imports main.mts. - Configures planner response via setStdoutByRunName to return a valid <plan> JSON with one issue, exercising all four phases. - Asserts exactly 4 recorded invocations in correct order: plan first (head strategy, maxIterations 1, name "planner"), then implement (branch strategy, maxIterations 100, name "implementer"), then review (branch strategy, maxIterations 1, name "reviewer"), then merge (head strategy, maxIterations 1, name "merger"). - Asserts plan prompt matches scaffolded plan-prompt.md content. - Asserts implement prompt matches scaffolded implement-prompt.md with {{TASK_ID}}, {{ISSUE_TITLE}}, {{BRANCH}} substituted from plan data. - Asserts review prompt matches scaffolded review-prompt.md with {{BRANCH}} substituted from plan issue branch. - Asserts merge prompt matches scaffolded merge-prompt.md with {{BRANCHES}} and {{ISSUES}} substituted. - Asserts promptArgs per phase match expected values. - Asserts all backlog-manager shell expressions (gh issue list/close, bd ready/close) appear across the four prompt files pre-expansion. - Asserts agent provider and model (via defaultModel) match scaffold-time choice for all four invocations. - Hoisted planIssue and planResponse constants to parent scope, shared with parallel-planner tests. Infrastructure changes to support the test: - testSupport: added createSandboxForTest that delegates sandbox.run() calls to runForTest so templates using createSandbox() record invocations through the same recording agent invoker. The production createSandbox uses ProductionAgentInvokerLayer which bypasses recording; the test version ensures all invocations are captured. Files changed: - src/initTemplateE2e.test.ts — added parallel-planner-with-review describe block, hoisted plan constants - src/testSupport.ts — createSandboxForTest + re-export as createSandbox No blockers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Remove verbose per-assertion comments (e.g. "Agent provider matches the --agent choice") from simple-loop, sequential-reviewer, and parallel-planner tests to match the compact assertion style already used in the parallel-planner-with-review test - Remove stray blank line in parallel-planner describe block - Keep section headers (Phase 1/2/3) for navigation in long tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Implements PRD #447: end-to-end tests for init templates via an AgentInvoker seam.
Introduces three new deep modules (AgentInvoker Tag, PromptPreprocessor Tag, LocalSandboxFactory) and an internal test-support entry point (
runForTest). Then adds e2e tests for each init template — blank, simple-loop, sequential-reviewer, parallel-planner, parallel-planner-with-review — parameterised across all valid (agent, backlog-manager) combinations.Tests scaffold each template into a tmp dir, execute the generated
main.mtswith@ai-hero/sandcastlealiased to the internal testing entry via Vitest module aliasing, and assert recorded agent invocations against expected prompt file, prompt arguments, agent provider, model, branch strategy, and call count. No Docker, no real agent, no network.Closes #447, #448, #449, #450, #451, #452.
Closes #448
Closes #449
Closes #450
Closes #451
Closes #452