Skip to content

fix(cache): delay history image pruning to preserve prompt cache prefix#3

Open
bcherny wants to merge 1 commit intomainfrom
cache/image-prune-lazy
Open

fix(cache): delay history image pruning to preserve prompt cache prefix#3
bcherny wants to merge 1 commit intomainfrom
cache/image-prune-lazy

Conversation

@bcherny
Copy link
Copy Markdown
Owner

@bcherny bcherny commented Mar 25, 2026

Problem

pruneProcessedHistoryImages replaces image blocks with [image data removed - already processed by model] text for every user/toolResult message before the last assistant turn, on every run.

  • Turn N: user sends image → provider caches the image bytes in the prompt prefix
  • Turn N+1: prune runs, replaces image bytes with text marker → request bytes diverge at that message → cache miss from there onward

This defeats prompt caching for any conversation that includes images.

Fix

Only prune images older than 3 assistant turns (PRESERVE_RECENT_ASSISTANT_TURNS). Recent history stays byte-identical so the cached prefix survives across turns, while legacy sessions with persisted image payloads still get cleaned up once they age out.

The original purpose — per the doc comment — was "idempotent cleanup for legacy sessions", so aggressive per-turn pruning was never needed.

Verification

  • No downstream consumer depends on the PRUNED_HISTORY_IMAGE_MARKER appearing at turn N+1 (only referenced in this file + test)
  • Context-size overflow is handled by compaction (applyPiAutoCompactionGuard), not by this prune — keeping ~3 turns of images is a small delta vs. the previous 0-1
  • Legacy session migration (state-migrations.ts) still exists, so this cleanup remains load-bearing for migrated sessions

Tests

6 tests pass including two new cases:

  • keeps image blocks within the last 3 assistant turns to preserve prompt cache
  • prunes only old images while preserving recent ones

🤖 Generated with Claude Code

pruneProcessedHistoryImages was stripping image blocks from every
already-answered user turn on each run. Turn N sends image bytes → provider
caches the prefix. Turn N+1 replaces image with text marker → bytes diverge
at that message → cache miss from there onward.

Now only prune images older than 3 assistant turns. Recent history stays
byte-identical so the cached prefix survives, while legacy sessions with
persisted image payloads still get cleaned up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant