feat: add prompt caching interceptor for 6 LLM providers#3
Conversation
Add comprehensive prompt caching interceptor with support for: - Anthropic: cache_control blocks with 5min/1h TTL - OpenAI: prompt_cache_key + prompt_cache_retention body params - xAI (Grok): x-grok-conv-id header for cache routing - Fireworks: x-session-affinity + x-prompt-cache-isolation-key headers - AWS Bedrock: cachePoint objects in system/messages/tools - Azure OpenAI: Uses OpenAI interceptor (same API) Features: - Cache-Control: no-cache header to disable per-request - Provider detection via model name patterns - Cache usage tracking in ResponseMetadata.Custom - Org/tenant ID namespacing for multi-tenant isolation - Auto-derived cache keys from static content prefix Includes extensive test coverage and documentation updates.
📝 WalkthroughWalkthroughAdds a PromptCachingInterceptor and related types/functions for provider-specific prompt caching (Anthropic, OpenAI, xAI, Fireworks, Bedrock); augments response extraction to emit standardized cache usage metadata; adds OrgID to meta context; updates docs and README; includes extensive tests and provider extractor/parser changes. Changes
🚥 Pre-merge checks | ✅ 1✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
providers/bedrock/extractor.go (1)
59-65: Make Bedrock cache usage emission resilient to detail-only responses.Consider including
len(bedrockResp.Usage.CacheDetails) > 0in the guard socache_usageis still emitted if totals are omitted but details are present.Suggested robustness tweak
- if bedrockResp.Usage.CacheReadInputTokens > 0 || bedrockResp.Usage.CacheWriteInputTokens > 0 { + if bedrockResp.Usage.CacheReadInputTokens > 0 || + bedrockResp.Usage.CacheWriteInputTokens > 0 || + len(bedrockResp.Usage.CacheDetails) > 0 { meta.Custom["cache_usage"] = llmproxy.CacheUsage{ CachedTokens: bedrockResp.Usage.CacheReadInputTokens, CacheWriteTokens: bedrockResp.Usage.CacheWriteInputTokens, CacheDetails: extractCacheDetails(bedrockResp.Usage.CacheDetails), } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@providers/bedrock/extractor.go` around lines 59 - 65, The guard that sets meta.Custom["cache_usage"] currently only checks CacheReadInputTokens and CacheWriteInputTokens on bedrockResp.Usage; change it to also emit cache_usage when bedrockResp.Usage.CacheDetails is non-empty (e.g., if CacheReadInputTokens and CacheWriteInputTokens are zero but len(bedrockResp.Usage.CacheDetails) > 0). Update the conditional around llmproxy.CacheUsage construction (the block creating llmproxy.CacheUsage and calling extractCacheDetails) to check for either totals > 0 OR non-empty CacheDetails and handle a nil CacheDetails safely before calling extractCacheDetails.interceptors/promptcaching.go (1)
158-168: Consider handling JSON parse errors more gracefully for Fireworks cached tokens.If the
fireworks-cached-prompt-tokensheader contains a malformed value, the error is silently ignored. While this is defensive, logging might help with debugging cache issues in production.💡 Optional: Add debug logging for parse failures
if cached := resp.Header.Get("fireworks-cached-prompt-tokens"); cached != "" { if respMeta.Custom == nil { respMeta.Custom = make(map[string]any) } var cachedTokens int - if err := json.Unmarshal([]byte(cached), &cachedTokens); err == nil && cachedTokens > 0 { + if err := json.Unmarshal([]byte(cached), &cachedTokens); err != nil { + // Log parse error if logging is available + } else if cachedTokens > 0 { respMeta.Custom["cache_usage"] = llmproxy.CacheUsage{ CachedTokens: cachedTokens, } } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interceptors/promptcaching.go` around lines 158 - 168, The JSON unmarshal of the "fireworks-cached-prompt-tokens" header in the resp header handling silently ignores parse errors; update the block around resp.Header.Get("fireworks-cached-prompt-tokens") / respMeta.Custom to log a debug/warn when json.Unmarshal fails (include the header value and the error) so malformed values are visible in logs; use the project's existing logger (or the standard log package) and keep the behavior of not crashing—only add a diagnostic log entry alongside the current defensive flow that leaves respMeta unchanged when parsing fails; reference the resp.Header.Get call, json.Unmarshal, respMeta.Custom and llmproxy.CacheUsage to locate the code to change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@providers/openai_compatible/extractor.go`:
- Around line 102-118: PromptTokensDetails currently declares fields that don't
exist in OpenAI's Chat Completions API; update the struct by removing
ImageTokens, ReasoningTokens, AcceptedPredictionTokens, and
RejectedPredictionTokens so PromptTokensDetails only contains CachedTokens and
AudioTokens, leaving the other token fields only in CompletionTokensDetails
(which already holds ReasoningTokens, AcceptedPredictionTokens,
RejectedPredictionTokens) to match the API spec.
In `@README.md`:
- Line 59: Update the "Prompt Caching" bullet (currently "Prompt Caching:
Anthropic and OpenAI prompt caching support") to reflect all supported providers
by changing the text to a more inclusive phrase that lists or groups Anthropic,
OpenAI, xAI, Fireworks, and Bedrock (e.g., "Prompt Caching: prompt caching
support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock"), ensuring the
README entry matches the new PR documentation.
---
Nitpick comments:
In `@interceptors/promptcaching.go`:
- Around line 158-168: The JSON unmarshal of the
"fireworks-cached-prompt-tokens" header in the resp header handling silently
ignores parse errors; update the block around
resp.Header.Get("fireworks-cached-prompt-tokens") / respMeta.Custom to log a
debug/warn when json.Unmarshal fails (include the header value and the error) so
malformed values are visible in logs; use the project's existing logger (or the
standard log package) and keep the behavior of not crashing—only add a
diagnostic log entry alongside the current defensive flow that leaves respMeta
unchanged when parsing fails; reference the resp.Header.Get call,
json.Unmarshal, respMeta.Custom and llmproxy.CacheUsage to locate the code to
change.
In `@providers/bedrock/extractor.go`:
- Around line 59-65: The guard that sets meta.Custom["cache_usage"] currently
only checks CacheReadInputTokens and CacheWriteInputTokens on bedrockResp.Usage;
change it to also emit cache_usage when bedrockResp.Usage.CacheDetails is
non-empty (e.g., if CacheReadInputTokens and CacheWriteInputTokens are zero but
len(bedrockResp.Usage.CacheDetails) > 0). Update the conditional around
llmproxy.CacheUsage construction (the block creating llmproxy.CacheUsage and
calling extractCacheDetails) to check for either totals > 0 OR non-empty
CacheDetails and handle a nil CacheDetails safely before calling
extractCacheDetails.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: c24cecd6-aaec-479d-90d8-6c0da645787a
📒 Files selected for processing (12)
DESIGN.mdREADME.mdinterceptor.gointerceptors/promptcaching.gointerceptors/promptcaching_test.gometadata.goproviders/anthropic/extractor.goproviders/anthropic/parser_test.goproviders/bedrock/extractor.goproviders/bedrock/parser.goproviders/openai_compatible/extractor.goproviders/openai_compatible/parser_test.go
📜 Review details
🔇 Additional comments (22)
interceptor.go (1)
73-73:MetaContextValue.OrgIDaddition is clean and backward-compatible.This is a safe additive change and aligns with org-scoped caching needs.
providers/openai_compatible/parser_test.go (1)
497-559: Cache usage extractor tests are well-targeted.Good coverage of
cached_tokens > 0, missing field, and explicit zero behavior.providers/anthropic/parser_test.go (1)
158-202: Anthropic cache usage test additions look good.These tests correctly validate both populated and non-populated
meta.Custom["cache_usage"]paths.README.md (1)
105-121: Provider-specific interceptor examples are clear and practical.Nice addition—these examples make adoption much easier.
DESIGN.md (1)
122-140: Design documentation updates are comprehensive and aligned with implementation scope.The new cache metadata and prompt-caching sections are detailed and coherent.
Also applies to: 298-299, 437-649, 739-739
metadata.go (1)
59-83:CacheUsage/CacheDetailmodeling is solid.Good unified schema for cross-provider cache accounting in response metadata.
providers/bedrock/parser.go (1)
108-115: Bedrock request model extension forcachePointis correctly integrated.This is a clean additive change and fits Bedrock caching injection behavior.
Also applies to: 150-152, 170-172
providers/bedrock/extractor.go (1)
70-82:cacheDetailsmapping and response usage extensions look good.The conversion into
llmproxy.CacheDetailis straightforward and maintainable.Also applies to: 117-129
providers/anthropic/extractor.go (2)
45-55: LGTM! Cache usage extraction is correctly implemented.The logic properly:
- Initializes
CacheUsagewith top-level cache token counts- Conditionally populates ephemeral token breakdown when
CacheCreationis present- Only writes to
meta.Customwhen meaningful cache activity occurred (tokens > 0)
102-114: Field names verified against Anthropic API documentation.The JSON tags for
CacheCreationInfofields (ephemeral_5m_input_tokensandephemeral_1h_input_tokens) are correct and match the current Anthropic Claude API specification for prompt caching cache_creation response objects.providers/openai_compatible/extractor.go (1)
49-53: LGTM! OpenAI cache usage extraction is correctly implemented.The conditional check properly handles the optional
PromptTokensDetailsand only populatescache_usagewhen cached tokens are present and greater than zero.interceptors/promptcaching_test.go (4)
16-71: LGTM! Anthropic system string transformation test is well-structured.The test correctly validates:
- System string is converted to array format with
cache_controlblock- The
cache_controlis placed directly on the content block- Text content is preserved during transformation
266-306: LGTM! OpenAI cache usage callback test validates the result forwarding pattern.The test correctly verifies that
cache_usagefrom response metadata is forwarded to theonResultcallback when usingNewOpenAIPromptCachingWithResult.
1401-1435: LGTM! Fireworks cache usage extraction test validates header parsing.The test correctly verifies that
fireworks-cached-prompt-tokensresponse header is parsed and forwarded via theonResultcallback.
1793-1840: LGTM! Bedrock cache usage callback test validates the result forwarding pattern.The test correctly verifies that cache usage with
CacheDetailsis properly forwarded through the interceptor'sonResultcallback.interceptors/promptcaching.go (7)
50-115: LGTM! Main intercept method has clean control flow.The method properly:
- Short-circuits when disabled or
Cache-Control: no-cacheis present- Routes to provider-specific handlers
- Handles response metadata and
onResultcallback forwarding
217-304: LGTM! Anthropic cache_control injection handles all content formats correctly.The implementation properly:
- Converts string
systemto array format withcache_control- Adds
cache_controlto the last block of arraysystem- Handles both string and array message
contentformats- Only modifies the last eligible (user/assistant) message
550-570: LGTM! OrgID extraction has clear priority ordering.The extraction chain correctly prioritizes:
- Custom
OrgIDExtractorfunction- Context metadata (
MetaContextValue.OrgID)- Request header (
X-Org-ID)- Body metadata custom field (
org_id)- Configured namespace (fallback)
582-615: LGTM! Cache key derivation from prefix is well-designed.The function correctly:
- Builds a deterministic hash from system prompt, tools, and prior messages
- Excludes the last message (typically the new user input)
- Returns empty string for invalid/empty input
- Uses SHA-256 truncated to 128 bits which provides sufficient uniqueness for cache keys
617-638: LGTM! Model detection helpers are comprehensive.The substring-based detection correctly identifies:
- OpenAI models including future
o4-series- xAI Grok models
- Fireworks models (both direct and accounts/ prefix)
- Bedrock models (Anthropic Claude, Amazon Nova/Titan)
787-798: LGTM! DefaultOrgIDExtractor provides a reusable extraction function.This exported function allows users to compose custom extractors while leveraging the default lookup chain. The priority order (context → header → meta.Custom) is consistent with the internal
extractOrgIDmethod.
399-412: Implementation correctly matches AWS Bedrock's cachePoint format.The code correctly appends
cachePointas a separate block in the system array, which aligns with AWS Bedrock's documented system prompt structure where the cache point is added as a distinct object following the text prompt.
- Remove incorrect fields from PromptTokensDetails (only cached_tokens and audio_tokens per OpenAI spec) - Update README to list all prompt caching providers - Include CacheDetails check in Bedrock cache_usage emission
- Add CacheKeyExtractor type for dynamic extraction from request context - Add TraceIDCacheKeyExtractor to use OTEL trace ID as cache key - Add NewXAIPromptCachingAuto, NewXAIPromptCachingWithTraceID constructors - Add NewFireworksPromptCachingAuto, NewFireworksPromptCachingWithTraceID constructors - Support X-Cache-Key header override for all providers - Add resolveDynamicCacheKey with priority: header > extractor > CacheKeyFn > CacheKey
There was a problem hiding this comment.
🧹 Nitpick comments (2)
interceptors/promptcaching_test.go (1)
1228-1262: Consider adding test case forMetaContext.OrgIDextraction path.The
TestDefaultOrgIDExtractortest doesn't verify theMetaContext.OrgIDextraction path (line 898-900 in implementation). While the "none" case passes because no sources are set, there's no explicit test thatMetaContext.OrgIDtakes precedence when set.🧪 Optional: Add test case for MetaContext precedence
func TestDefaultOrgIDExtractor(t *testing.T) { tests := []struct { name string setup func(*http.Request, *llmproxy.BodyMetadata) + ctxSetup func(context.Context) context.Context expected string }{ + { + name: "from meta context", + setup: func(*http.Request, *llmproxy.BodyMetadata) {}, + ctxSetup: func(ctx context.Context) context.Context { + return llmproxy.WithMeta(ctx, llmproxy.MetaContextValue{OrgID: "org-context"}) + }, + expected: "org-context", + }, { name: "from header", setup: func(req *http.Request, _ *llmproxy.BodyMetadata) { req.Header.Set(HeaderOrgID, "org-header") }, + ctxSetup: nil, expected: "org-header", },🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interceptors/promptcaching_test.go` around lines 1228 - 1262, Add a new test case in TestDefaultOrgIDExtractor to cover the MetaContext.OrgID extraction path: set req.Context() to include a metadata context whose MetaContext.OrgID is populated (e.g., using llmproxy.WithMeta or whatever context helper you have) and assert that DefaultOrgIDExtractor(ctx, req, meta) returns that OrgID even when other sources are empty; reference DefaultOrgIDExtractor and MetaContext.OrgID so the new case validates the precedence of MetaContext over header/meta.Custom.interceptors/promptcaching.go (1)
608-641: Consider logging or handling the JSON unmarshal error.The error from
json.Unmarshalat line 617 is silently ignored. While this degrades gracefully (producing an empty key when parsing fails), it could hide issues with malformed request bodies during debugging.🔧 Optional: Add error logging for debugging
func DeriveCacheKeyFromPrefix(meta llmproxy.BodyMetadata, rawBody []byte) string { var body struct { System interface{} `json:"system"` Messages []struct { Role string `json:"role"` Content interface{} `json:"content"` } `json:"messages"` Tools interface{} `json:"tools"` } - json.Unmarshal(rawBody, &body) + if err := json.Unmarshal(rawBody, &body); err != nil { + // Parsing failed - return empty key to skip caching + return "" + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@interceptors/promptcaching.go` around lines 608 - 641, DeriveCacheKeyFromPrefix currently ignores the error returned by json.Unmarshal on rawBody; update the function to capture the error from json.Unmarshal and log or handle it (e.g., return empty key after logging) so malformed JSON doesn't silently fail. Specifically, in DeriveCacheKeyFromPrefix, check the error from json.Unmarshal(rawBody, &body), and use the existing logging facility (or a passed logger) to emit the error with context (include rawBody or truncated bytes if needed) before returning "", ensuring no panic and preserving the current graceful behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@interceptors/promptcaching_test.go`:
- Around line 1228-1262: Add a new test case in TestDefaultOrgIDExtractor to
cover the MetaContext.OrgID extraction path: set req.Context() to include a
metadata context whose MetaContext.OrgID is populated (e.g., using
llmproxy.WithMeta or whatever context helper you have) and assert that
DefaultOrgIDExtractor(ctx, req, meta) returns that OrgID even when other sources
are empty; reference DefaultOrgIDExtractor and MetaContext.OrgID so the new case
validates the precedence of MetaContext over header/meta.Custom.
In `@interceptors/promptcaching.go`:
- Around line 608-641: DeriveCacheKeyFromPrefix currently ignores the error
returned by json.Unmarshal on rawBody; update the function to capture the error
from json.Unmarshal and log or handle it (e.g., return empty key after logging)
so malformed JSON doesn't silently fail. Specifically, in
DeriveCacheKeyFromPrefix, check the error from json.Unmarshal(rawBody, &body),
and use the existing logging facility (or a passed logger) to emit the error
with context (include rawBody or truncated bytes if needed) before returning "",
ensuring no panic and preserving the current graceful behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 658e8c4a-6fc6-460e-bbf3-f0c7b45ce970
📒 Files selected for processing (2)
interceptors/promptcaching.gointerceptors/promptcaching_test.go
📜 Review details
🔇 Additional comments (19)
interceptors/promptcaching.go (10)
1-51: LGTM!Type definitions, constants, and config structures are well-organized. The function types (
CacheKeyFunc,CacheKeyExtractor,OrgIDExtractor) provide good flexibility for different cache key derivation strategies.
53-118: LGTM!The main
Interceptmethod has clean control flow with proper early returns for disabled state andCache-Control: no-cache. Provider routing is well-organized, and the body cloning pattern correctly manages the request body lifecycle.
120-144: LGTM!The xAI interceptor correctly handles the idempotency check for existing headers and properly resolves dynamic cache keys before setting the
x-grok-conv-idheader.
146-183: LGTM!The Fireworks interceptor correctly handles both isolation and session affinity headers with proper idempotency checks. The response header parsing for cached tokens is correctly implemented with proper nil-map initialization.
185-208: LGTM!The Bedrock interceptor follows the same clean pattern as other providers, delegating body modification to
checkBedrockand handling the response callback consistently.
223-310: LGTM!The Anthropic body modification correctly handles both string and array formats for system prompts and message content. The backward iteration through messages ensures the cache control is placed on the last user/assistant message as expected by Anthropic's API.
393-475: LGTM!The Bedrock implementation correctly appends
cachePointas a separate content block (rather than embedding it), which aligns with the AWS Bedrock Converse API design where cache points are standalone block types in the content arrays.
643-664: LGTM!The model detection functions use reasonable string patterns for each provider. The approach is flexible enough to handle variations while being specific enough to avoid common false positives.
673-908: LGTM!The comprehensive set of constructors provides a clean API for various use cases. The consistent naming convention (
New<Provider>PromptCaching...) makes the API discoverable, and the delegation pattern keeps implementation DRY.
910-921: No action needed.TraceExtractorandTraceInfoare properly defined ininterceptors/tracing.go(lines 23 and 12 respectively) within the same package.> Likely an incorrect or invalid review comment.interceptors/promptcaching_test.go (9)
17-200: LGTM!Anthropic tests provide comprehensive coverage of system prompt handling (string vs array), message content modification, and TTL variants. The upstream server handlers correctly validate the transformed request bodies.
202-307: LGTM!OpenAI tests cover cache key insertion, retention settings, and the cache usage callback mechanism. The cache usage test correctly simulates the response metadata flow from upstream extractors.
309-508: LGTM!Excellent coverage of skip and bypass scenarios including model mismatch, existing cache markers, disabled state, error passthrough, and
Cache-Control: no-cachehandling. The idempotency tests correctly verify that cache markers aren't duplicated.
543-735: LGTM!Well-structured table-driven unit tests for helper functions. The
TestCheckOpenAItest comprehensively covers combinations of cache key and retention settings.
900-1073: LGTM!xAI tests provide good coverage including header injection, model filtering, existing header preservation, and
Cache-Control: no-cachehandling.
1264-1489: LGTM!Fireworks tests comprehensively cover both request header injection and response header parsing for cache metrics. The cache usage test correctly validates that the interceptor parses the
fireworks-cached-prompt-tokensresponse header.
1492-1791: LGTM!Bedrock tests provide excellent coverage across all three insertion points (system, messages, tools), TTL variants, and cache usage extraction. The tests correctly verify the appended
cachePointblock structure.
1843-1983: LGTM!Trace ID and auto-derive tests comprehensively verify the dynamic cache key extraction mechanisms for both xAI and Fireworks. The hex encoding of trace IDs is correctly validated.
1985-2047: LGTM!Header override tests correctly verify that the
X-Cache-Keyheader takes precedence over constructor-provided keys for both xAI and Fireworks providers. This ensures per-request override capability works as expected.
Summary
Adds a comprehensive prompt caching interceptor supporting 6 LLM providers with provider-specific caching mechanisms:
cache_controlblocks with 5min (free) / 1h TTL optionsprompt_cache_key+prompt_cache_retentionbody parameters with auto-derived keysx-grok-conv-idHTTP header for cache routingx-session-affinity+x-prompt-cache-isolation-keyHTTP headerscachePointobjects in system/messages/tools with TTL supportprompt_cache_keyAPI)Key Features
ResponseMetadata.Custom["cache_usage"]Files Changed
interceptors/promptcaching.gointerceptors/promptcaching_test.gometadata.goCacheUsage,CacheDetailstructsinterceptor.goOrgIDfield toMetaContextValueproviders/anthropic/extractor.goproviders/bedrock/parser.goCachePointtypeproviders/bedrock/extractor.goproviders/openai_compatible/extractor.goDESIGN.mdREADME.mdTest Coverage
All tests passing across all packages:
Summary by CodeRabbit
New Features
Documentation
Tests
Other