Skip to content

feat: end-to-end cache token tracking + multi-provider coverage#1

Draft
Simon-Free wants to merge 5 commits intomainfrom
pr11-cache-token-tracking
Draft

feat: end-to-end cache token tracking + multi-provider coverage#1
Simon-Free wants to merge 5 commits intomainfrom
pr11-cache-token-tracking

Conversation

@Simon-Free
Copy link
Copy Markdown
Owner

@Simon-Free Simon-Free commented Apr 18, 2026

Summary

End-to-end tracking of prompt-cache token usage, from the provider usage response through AgentState into checkpoint snapshots. Two new small helpers in providers.py (_anthropic_cache_tokens, _openai_cached_read_tokens) give each provider family one obvious extraction point instead of three sprinkled getattr chains.

Provider compatibility

Provider Cache read Cache write Mechanism
Anthropic (native stream_anthropic) ? usage.cache_read_input_tokens ? usage.cache_creation_input_tokens Both fields live on final.usage; returned by Anthropic when prompt-caching beta is active
OpenAI / OpenAI-compatible (stream_openai_compat, covers OpenAI, Gemini, Groq, xAI, any OpenAI-schema provider) ? usage.prompt_tokens_details.cached_tokens ? always 0 OpenAI's schema has no separate "cache creation" counter - caching is implicit on their side
Ollama (stream_ollama) ? always 0 ? always 0 No prompt-caching in Ollama today
Any future provider (custom stream_xxx, downstream fork) defaults to 0 defaults to 0 AssistantTurn defaults + getattr(event, 'cache_read_tokens', 0) in agent.run + getattr(state, 'total_cache_read_tokens', 0) in make_snapshot = no-op fallback if the provider never sets the fields
Bedrock via litellm (downstream forks, e.g. bouzecode) handled gracefully handled gracefully When the wrapper forwards Anthropic-shaped usage, _anthropic_cache_tokens catches the fields; when it shapes to OpenAI, _openai_cached_read_tokens catches the read side. When neither, the getattr defaults to 0 - no exception path

Missing / None usage fields are coerced to 0 throughout; see TestAnthropicCacheExtraction::test_missing_fields_default_to_zero / test_none_fields_coerced_to_zero and the OpenAI equivalents.

Changes

File +/- What
providers.py +31 _anthropic_cache_tokens helper, _openai_cached_read_tokens helper, cache-write field on AssistantTurn, 2 call-sites in stream_anthropic, 2 in stream_openai_compat, stream_ollama now explicitly passes 0/0
agent.py +4 AgentState.total_cache_read_tokens / total_cache_write_tokens; accumulate from assistant_turn on every turn via getattr(... , 0) so providers that don't set the fields still work
checkpoint/store.py +2 token_snapshot["cache_read"] / ["cache_write"] persisted via getattr(state, ... , 0)
tests/test_cache_tokens.py rewritten (~170 lines) 5 layers of coverage - see below
tests/test_checkpoint.py ?52 Cache cases moved out to test_cache_tokens.py so checkpoint tests stay focused on snapshots

Test layers

  1. AssistantTurn / AgentState - constructor defaults, explicit values, accumulation across increments.
  2. Checkpoint persistence - real make_snapshot call against tmp_path, asserts token_snapshot keys.
  3. Provider extraction helpers - 3 cases each for Anthropic and OpenAI (populated / missing / None); Ollama shape check.
  4. E2E one-turn - agent.run drains a mocked providers.stream that emits an AssistantTurn with cache tokens; asserts state.total_cache_* and snapshot values.
  5. E2E multi-turn - two consecutive agent.run calls with distinct cache values; asserts running totals.

Backwards compatibility

  • Adding fields with defaults = 0 to AssistantTurn and AgentState is non-breaking for every caller that constructs them positionally or by keyword.
  • make_snapshot uses getattr(state, "total_cache_read_tokens", 0) so old AgentState instances rehydrated from pre-PR session files still produce valid snapshots.
  • No config flag - the feature is free: if a provider never sets the cache fields, everything records as 0.

Simon FREYBURGER and others added 5 commits April 18, 2026 09:52
…ider tests

Two new small helpers in providers.py give each provider family one
obvious extraction point instead of three sprinkled getattr chains:

- _anthropic_cache_tokens(usage) -> (read, write)
  Reads cache_read_input_tokens / cache_creation_input_tokens. Returns
  (0, 0) if the fields are missing (older SDKs, Bedrock-via-litellm,
  non-cached calls) or None (Anthropic occasionally emits JSON null).
- _openai_cached_read_tokens(usage) -> int
  Walks usage.prompt_tokens_details.cached_tokens. OpenAI's schema has
  no separate cache-creation counter (caching is implicit), so the
  write-side stays 0 for this entire provider family.

stream_anthropic, stream_openai_compat now call these helpers instead of
inlining the getattr dance. stream_ollama was already 0/0; behaviour
unchanged. Any new provider that builds an AssistantTurn without passing
cache_read_tokens / cache_write_tokens inherits the dataclass defaults
and agent.run's getattr(... , 0) fallbacks, so downstream totals and
snapshots stay consistent.

Tests (tests/test_cache_tokens.py, rewritten):
- AssistantTurn + AgentState defaults and accumulation.
- Checkpoint snapshot persists cache_read + cache_write via real
  make_snapshot against a tmp_path.
- TestAnthropicCacheExtraction (3 cases) + TestOpenAICacheExtraction
  (3 cases) covering populated / missing / None usage objects.
- Ollama shape check (no-cache path).
- test_agent_run_propagates_cache_tokens_from_mocked_stream: one turn
  through agent.run with a scripted stream; asserts state totals AND the
  produced snapshot.
- test_agent_run_accumulates_cache_across_multi_turn: two consecutive
  runs with distinct cache values; asserts running totals.

Cleanup:
- The three duplicate cache-token cases previously appended to
  tests/test_checkpoint.py are removed; test_cache_tokens.py is the
  single home for this feature now.
- Fix the stale make_snapshot(state, session_id, prompt) call in
  test_cache_tokens that survived from the earlier signature mismatch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@Simon-Free Simon-Free changed the title feat: end-to-end cache token tracking feat: end-to-end cache token tracking + multi-provider coverage Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant