refactor(orchestration): migrate prompt builders to file-based templates#459
refactor(orchestration): migrate prompt builders to file-based templates#459
Conversation
Addresses two items flagged by PR #459 final review: 1. lib/prompts/README.md — conventions, syntax, gotchas, and local-testing recipe for template authors. Previously there was no doc for the stated goal of "non-engineers can edit prompts." 2. tests/prompts/templates.test.ts — 8 structural assertions covering: - no surviving {{...}} placeholders in any rendered template - role dispatch (teacher → LEAD TEACHER, student → not) - scene-type action stripping (quiz scene has no spotlight/laser) - director prompt output spec mentions next_agent These replace the removed byte-equal snapshot suite at much lower maintenance cost — they assert behaviors the refactor must preserve, not exact bytes.
dd2617b to
45de0f8
Compare
Agent Code Review ProcessThis PR was developed and reviewed using superpowers:subagent-driven-development. 20 subagent dispatches total — 7 implementers, 5 spec-compliance reviewers, 5 code-quality reviewers, 3 follow-up fix implementers, 2 final PR-level reviews. Every task ran the cycle: implementer → spec review → code-quality review. Substantive findings and how they were addressed
Final state
|
Move file-based prompt loader from lib/generation/prompts/ to project-level lib/prompts/ so it can be shared by orchestration and PBL prompts in subsequent commits. - Templates and snippets moved alongside the loader - getPromptsDir() now points at lib/prompts/ - Generation pipeline imports updated to @/lib/prompts (outline-generator, scene-generator, search-query-builder, scene-outlines-stream route) - Add tests/prompts/loader.test.ts as sanity gate Pure file move — no behavior change.
Snapshot tests cover variant matrix: - buildStructuredPrompt: role × scene-type × peer/ledger/discussion/profile - convertMessagesToOpenAI: mixed message kinds - summarizeConversation: truncation - buildDirectorPrompt: Q&A vs discussion, ledger, profile - buildPBLSystemPrompt: default config These lock current output so the body refactor in subsequent commits can be verified byte-equal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Split convertMessagesToOpenAI test: same-agent (existing) + cross-agent (new). The original test passed currentAgentId matching the message's agentId, so the cross-agent role-conversion branch was never exercised. - Add 'teacher / quiz scene' snapshot to lock the spotlight/laser strip path in getEffectiveActions. The previous variant-7 test (whiteboard-open) only triggered the mutual-exclusion warning, not the strip — renamed it to reflect what it actually tests. - Add assistant-role variant (was uncovered: ROLE_GUIDELINES, buildLengthGuidelines, buildWhiteboardGuidelines all branch on role).
Move state-context, virtual whiteboard ledger replay, peer context, message converter, and conversation summary out of the 890-line prompt-builder.ts into focused modules under lib/orchestration/summarizers/. prompt-builder.ts now only owns prompt assembly. director-graph imports updated to the new locations. Snapshot tests pass byte-equal — no behavior change.
The buildStructuredPrompt template literal is now lib/prompts/templates/agent-system/system.md, assembled by a thin variable-pass through the shared prompt loader. Per-variant ternaries (format example, ordering, spotlight examples, mutual-exclusion note) are kept as module-level constants in prompt-builder.ts. ROLE_GUIDELINES, buildLengthGuidelines, and buildWhiteboardGuidelines stay too (they may further migrate to template partials in Phase 2). Also: PROMPT_IDS gains a 'satisfies Record<string, PromptId>' clause to prevent constant/union drift as more IDs are added. Snapshot tests (12 prompt-builder + 4 director + 1 pbl) pass byte-equal.
- Rename agent-system placeholders snake_case → camelCase to match the convention already used by slide-actions/user.md and other generation templates. Settles convention before Task 5 lands two more templates that would otherwise inherit drift. - Extract '## Speech Guidelines (CRITICAL)' into lib/prompts/snippets/speech-guidelines.md for reuse by director and PBL prompts in Task 5. Snapshot tests pass byte-equal — placeholder name changes are invisible in rendered output, and snippet inclusion happens at load time.
director-prompt.ts and pbl-system-prompt.ts now use the shared template loader. Bodies collapse to thin variable assembly. Three orchestration-tier prompts now share one infrastructure: agent-system, director, pbl-design. Snapshot tests pass byte-equal.
…mmary to types module These two interfaces are imported by 6+ modules including summarizers/ — having them live in director-prompt.ts created an awkward upward import direction (summarizers reaching back to a sibling prompt builder for types). Move them to lib/orchestration/types.ts and update all callers. director-prompt.ts now imports them too. Pure type-location refactor — snapshots pass byte-equal.
Per review feedback: byte-equal snapshots were the wrong invariant for this refactor. The goal was a stable substrate for editing prompts, not a frozen contract. Snapshots would have generated maintenance churn (~100-200 line diffs per intentional Phase 2 prompt tweak) without providing value the eval doesn't already. Test posture going forward: - tests/prompts/loader.test.ts (3 tests) — bounds the loader infrastructure (template load, snippet inclusion, variable interpolation, missing-id behavior) - pnpm eval:whiteboard — bounds end-to-end agent loop, template composition, and integration with chat/director/state-manager Removed: 1879 lines of .snap + 320 lines of test scaffolding + fixtures. Net PR diff drops by ~2200 lines.
Addresses two items flagged by PR #459 final review: 1. lib/prompts/README.md — conventions, syntax, gotchas, and local-testing recipe for template authors. Previously there was no doc for the stated goal of "non-engineers can edit prompts." 2. tests/prompts/templates.test.ts — 8 structural assertions covering: - no surviving {{...}} placeholders in any rendered template - role dispatch (teacher → LEAD TEACHER, student → not) - scene-type action stripping (quiz scene has no spotlight/laser) - director prompt output spec mentions next_agent These replace the removed byte-equal snapshot suite at much lower maintenance cost — they assert behaviors the refactor must preserve, not exact bytes.
PR #461 (interactive mode clean) added 7 new templates under lib/generation/prompts/templates/ (the pre-refactor location). Rebase onto origin/main landed them at the old path; git's rename detection didn't forward them through the directory move in db56c40. Manually move them to lib/prompts/templates/ to match the new convention, and fix scene-generator.ts's relative import ('./prompts/types' → '@/lib/prompts/types') since the dir no longer exists as a sibling. No behavior change — templates read from PROMPT_IDS values via getPromptsDir() which already points at lib/prompts/.
45de0f8 to
0274e2c
Compare
cosarah
left a comment
There was a problem hiding this comment.
Behavior preservation looks good to me for the three migrated prompts — summarizer extraction, types.ts move, and template naming convention are all improvements. Two things worth doing before merge, plus one real bug that's low-impact today but worth fixing.
Important
-
Structural tests cover only about half the conditional surface.
tests/prompts/templates.test.tsasserts role dispatch and scene-type stripping but skips the branches Phase 2 is most likely to touch:- Director discussion-context branch —
lib/orchestration/director-prompt.ts:51-60 - Peer context section —
lib/orchestration/prompt-builder.ts:145(largest single conditional block in agent-system) - Assistant role — teacher and student are tested;
buildLengthGuidelines/buildWhiteboardGuidelinesassistant branches are silent - Language directive — fixture always sets
languageDirective: 'zh-CN'; the null path isn't exercised - PBL
{{issueCount}}appears three times in the template; no assertion confirms all three are filled
These are small additions, not a re-snapshot.
- Director discussion-context branch —
-
README doesn't tell editors what is still in TS. The PR body correctly notes viewport bounds (1000×562), role-specific length targets, and
ROLE_GUIDELINESstill live inprompt-builder.ts. A contributor readinglib/prompts/README.mdexpecting a pure-markdown workflow won't know to go there. A short "Still in TS" section would prevent a real friction point.
Minor
-
Snippet typos ship silently.
lib/prompts/loader.ts:44returns the literal string `{{snippet:${id}}}` on a missing snippet and only logs a warn. TheUNRESOLVED_PLACEHOLDERregex attests/prompts/templates.test.ts:88is/\{\{\w[\w-]*\}\}/, which does not match `{{snippet:foo}}` — the:breaks it. A typo like `{{snippet:speach-guidelines}}` would reach the LLM with no test failure. Either broaden the regex (e.g./\{\{(\w[\w-]*|snippet:[\w-]+)\}\}/) or haveloadSnippetthrow on missing files. -
interpolateVariablesregex is narrower thanprocessSnippets.loader.ts:102uses\{\{(\w+)\}\}— a kebab-case placeholder like `{{next-agent}}` would silently pass through. The README mandates camelCase so it's not a bug today; worth either a comment above the regex or a lint test that scans everytemplates/*/{system,user}.mdfor non-camelCase placeholders. -
lib/orchestration/summarizers/peer-context.tshas a single consumer. Not a blocker; flag if the one-caller situation persists after Phase 2 — at that point folding it back intoprompt-builder.tsremoves indirection.
Verdict
Approve with nits. The testing and README suggestions would substantially improve Phase 2's downstream safety; the snippet-regex fix is real but low-impact. Nothing here is structural.
- loadSnippet now throws on missing file instead of silently returning
a literal {{snippet:id}} string. A typo like {{snippet:speach-guidelines}}
now fails at load time instead of reaching the LLM.
- README gains a "Still in TypeScript" section listing the role-conditional
content that still lives in prompt-builder.ts (ROLE_GUIDELINES,
buildLengthGuidelines, buildWhiteboardGuidelines) so contributors
expecting a pure-markdown workflow know where to look.
- Expand tests/prompts/templates.test.ts to cover the conditional branches
Phase 2 is most likely to touch (9 new assertions):
- assistant role dispatch
- peer-context section toggles on agentResponses presence
- language constraint toggles on stage.languageDirective presence
- director Q&A vs discussion mode branching
- pbl-design {{issueCount}} substituted at all 3 occurrences
- placeholder-naming-convention lint scans every template for
non-camelCase placeholders (slide-content grandfathered)
- Comment above interpolateVariables regex documents why kebab-case
placeholders pass through silently (the lint test now catches them).
- New test in loader.test.ts locks the throw-on-missing-snippet behavior.
|
Thanks @cosarah. All 4 handled in Brief on why each slipped:
Leaving the |
Summary
Phase 1 of a two-phase refactor of the orchestration prompts subsystem.
Migrates three orchestration-tier prompt builders (
buildStructuredPrompt,buildDirectorPrompt,buildPBLSystemPrompt) from inline TS template literals to file-based markdown templates, sharing the loader infrastructure already used by the generation pipeline.Unblocks much of Phase 2 content optimization via markdown edits (LaTeX few-shot examples, speech guidelines, snippet additions). A subset — viewport bounds (1000×562), role-specific length targets, and role guidelines — still lives as TS template literals in
prompt-builder.tsand will migrate in a follow-up extraction pass.What changed
lib/generation/prompts/→lib/prompts/(project-level loader, used by both generation + orchestration)lib/orchestration/prompt-builder.ts890 → 314 lines (thin assembler)lib/orchestration/director-prompt.ts290 → 271 lines (thin assembler)lib/pbl/pbl-system-prompt.ts93 → 30 lines (thin assembler)lib/orchestration/summarizers/— 5 focused moduleslib/orchestration/types.ts— sharedWhiteboardActionRecord+AgentTurnSummarylib/prompts/templates/{agent-system,director,pbl-design}/system.mdlib/prompts/snippets/speech-guidelines.mdlib/prompts/README.md— conventions, syntax, gotchas, local-testing recipe for template authorsPROMPT_IDSgainsas const satisfies Record<string, PromptId>type guardTest posture
tests/prompts/loader.test.ts— bounds the loader infrastructure (3 tests)tests/prompts/templates.test.ts— bounds behavior-level invariants (8 tests): no unresolved{{...}}placeholders in any rendered prompt, role dispatch (teacher → LEAD TEACHER, student → not), scene-type action stripping (quiz scene strips spotlight/laser), director output-spec shapepnpm eval:whiteboard— bounds end-to-end composition. 1-scenario smoke run on this branch passed (agent loop ran 3 turns, 2 screenshots captured, scores within sampling noise of pre-refactor baseline)An earlier iteration of this branch added ~1900 lines of byte-equal snapshot tests as a refactor safety net. Those were removed after review surfaced that byte-equality was the wrong invariant for a refactor whose goal is enabling Phase 2 content changes — every intentional Phase 2 tweak would have produced large snapshot diffs for no benefit beyond what the eval and structural assertions already cover.
Phase 2 deferrals (deliberate)
ROLE_GUIDELINES,buildLengthGuidelines,buildWhiteboardGuidelinesremain as TS template literals inprompt-builder.ts— role-conditional prose; snippet extraction deferred until eval-driven changes show what needs the treatment.lib/prompts/templates/slide-content/{system,user}.mdstill usessnake_caseplaceholders (pre-existing). Normalize when generation pipeline gets a similar pass. The new README flags this.Test plan
npx tsc --noEmitcleanpnpm lintzero errorspnpm check(Prettier) cleanpnpm test tests/prompts— 10/10 passNote: 3 unrelated failures in
tests/server/provider-config.test.tsare environment-config tests that don't sandboxprocess.env; they fail in any setup with.env.localcontaining provider keys (verified same 3 fail on a freshmaincheckout). Not a regression from this branch.🤖 Generated with Claude Code