-
Notifications
You must be signed in to change notification settings - Fork 2
feat: prompt cache optimization — structure context for cache hits #703
Description
Context
Claude Code splits system prompts into static (globally cacheable) and dynamic (session-scoped) content using a boundary marker. Static content hits prompt cache = ~90% token savings on the system prompt portion.
Our agents rebuild the entire prompt from scratch each execution, mixing static and dynamic content. This means zero prompt cache benefit across agents or across turns within a conversation.
From Claude Code Source
// prompts.ts
const SYSTEM_PROMPT_DYNAMIC_BOUNDARY = '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'
// Static sections (before boundary) — cached globally:
// - Identity, rules, tool usage guidelines, tone, output style
// Dynamic sections (after boundary) — per-session:
// - Git status, CLAUDE.md, memory, environment info
// utils/api.ts — splitSysPromptPrefix()
// Assigns cache_control: { type: 'ephemeral', scope: 'global' } to static blocksHow Our Context Flows Today
agent-runner.ts builds prompt:
"You are {agent} from squad {squad}."
+ taskDirective (dynamic)
+ SYSTEM.md (static — same for all agents)
+ squadContext (mixed — company.md is static, state.md is dynamic)
+ cognitionContext (dynamic)
All of this goes into the -- prompt argument to Claude, which becomes the first user message. Claude Code's own system prompt (from CLAUDE.md files) IS cached, but our agent context is NOT.
Proposed Architecture
Split our context into two injection points:
1. Static context → Project CLAUDE.md (cached by Claude Code)
Move SYSTEM.md and company.md into a project-level .claude/CLAUDE.md that Claude Code loads and caches automatically:
# .claude/CLAUDE.md (in agents repo)
<!-- This content is cached by Claude Code's prompt cache system -->
## System Protocol
{contents of SYSTEM.md}
## Company
{contents of company.md}2. Dynamic context → -- prompt argument (per-execution)
Keep per-agent, per-run context in the prompt:
"You are {agent} from squad {squad}."
+ goals.md, state.md, feedback.md, priorities.md
+ taskDirective
Expected savings
SYSTEM.md (~2K tokens) + company.md (~500 tokens) = ~2.5K tokens cached across ALL agent executions. With prompt cache pricing at 90% discount, that's significant at scale.
Implementation
- Generate
.claude/CLAUDE.mdfrom SYSTEM.md + company.md duringsquads initor as a build step - Or: restructure so SYSTEM.md IS the project CLAUDE.md (simplest)
- Strip static content from
gatherSquadContext()so it's not double-injected - Verify Claude Code is loading the project CLAUDE.md (check
claudemd.tsdiscovery paths)
Files to change
src/lib/run-context.ts— remove SYSTEM.md and company.md from prompt injection.agents/SYSTEM.md→.claude/CLAUDE.mdmigration (or symlink)src/lib/agent-runner.ts— stop passing systemContext separately
Risks
- Claude Code's CLAUDE.md loading has specific rules (frontmatter, @include, etc.). Need to validate compatibility.
- Project CLAUDE.md is shared across all agents — can't include agent-specific instructions there. This is the right constraint (static = shared).
- Existing users with custom
.claude/CLAUDE.mdwould get our content injected. Need to merge, not overwrite.
Open questions
- Does Claude Code's
--printmode respect project CLAUDE.md? Need to verify. - Can we use
@includedirectives to keep SYSTEM.md and company.md as separate files but loaded via CLAUDE.md?