Skip to content

feat: prompt cache optimization — structure context for cache hits #703

@kokevidaurre

Description

@kokevidaurre

Context

Claude Code splits system prompts into static (globally cacheable) and dynamic (session-scoped) content using a boundary marker. Static content hits prompt cache = ~90% token savings on the system prompt portion.

Our agents rebuild the entire prompt from scratch each execution, mixing static and dynamic content. This means zero prompt cache benefit across agents or across turns within a conversation.

From Claude Code Source

// prompts.ts
const SYSTEM_PROMPT_DYNAMIC_BOUNDARY = '__SYSTEM_PROMPT_DYNAMIC_BOUNDARY__'

// Static sections (before boundary) — cached globally:
// - Identity, rules, tool usage guidelines, tone, output style
// Dynamic sections (after boundary) — per-session:
// - Git status, CLAUDE.md, memory, environment info

// utils/api.ts — splitSysPromptPrefix()
// Assigns cache_control: { type: 'ephemeral', scope: 'global' } to static blocks

How Our Context Flows Today

agent-runner.ts builds prompt:
  "You are {agent} from squad {squad}."
  + taskDirective (dynamic)
  + SYSTEM.md (static — same for all agents)
  + squadContext (mixed — company.md is static, state.md is dynamic)
  + cognitionContext (dynamic)

All of this goes into the -- prompt argument to Claude, which becomes the first user message. Claude Code's own system prompt (from CLAUDE.md files) IS cached, but our agent context is NOT.

Proposed Architecture

Split our context into two injection points:

1. Static context → Project CLAUDE.md (cached by Claude Code)

Move SYSTEM.md and company.md into a project-level .claude/CLAUDE.md that Claude Code loads and caches automatically:

# .claude/CLAUDE.md (in agents repo)
<!-- This content is cached by Claude Code's prompt cache system -->

## System Protocol
{contents of SYSTEM.md}

## Company
{contents of company.md}

2. Dynamic context → -- prompt argument (per-execution)

Keep per-agent, per-run context in the prompt:

"You are {agent} from squad {squad}."
+ goals.md, state.md, feedback.md, priorities.md
+ taskDirective

Expected savings

SYSTEM.md (~2K tokens) + company.md (~500 tokens) = ~2.5K tokens cached across ALL agent executions. With prompt cache pricing at 90% discount, that's significant at scale.

Implementation

  1. Generate .claude/CLAUDE.md from SYSTEM.md + company.md during squads init or as a build step
  2. Or: restructure so SYSTEM.md IS the project CLAUDE.md (simplest)
  3. Strip static content from gatherSquadContext() so it's not double-injected
  4. Verify Claude Code is loading the project CLAUDE.md (check claudemd.ts discovery paths)

Files to change

  • src/lib/run-context.ts — remove SYSTEM.md and company.md from prompt injection
  • .agents/SYSTEM.md.claude/CLAUDE.md migration (or symlink)
  • src/lib/agent-runner.ts — stop passing systemContext separately

Risks

  • Claude Code's CLAUDE.md loading has specific rules (frontmatter, @include, etc.). Need to validate compatibility.
  • Project CLAUDE.md is shared across all agents — can't include agent-specific instructions there. This is the right constraint (static = shared).
  • Existing users with custom .claude/CLAUDE.md would get our content injected. Need to merge, not overwrite.

Open questions

  • Does Claude Code's --print mode respect project CLAUDE.md? Need to verify.
  • Can we use @include directives to keep SYSTEM.md and company.md as separate files but loaded via CLAUDE.md?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions