Skip to content

feat: add prompt caching interceptor for 6 LLM providers#3

Merged
jhaynie merged 3 commits intomainfrom
feature/prompt-caching-interceptor
Apr 13, 2026
Merged

feat: add prompt caching interceptor for 6 LLM providers#3
jhaynie merged 3 commits intomainfrom
feature/prompt-caching-interceptor

Conversation

@jhaynie
Copy link
Copy Markdown
Member

@jhaynie jhaynie commented Apr 13, 2026

Summary

Adds a comprehensive prompt caching interceptor supporting 6 LLM providers with provider-specific caching mechanisms:

  • Anthropic: cache_control blocks with 5min (free) / 1h TTL options
  • OpenAI: prompt_cache_key + prompt_cache_retention body parameters with auto-derived keys
  • xAI (Grok): x-grok-conv-id HTTP header for cache routing
  • Fireworks: x-session-affinity + x-prompt-cache-isolation-key HTTP headers
  • AWS Bedrock: cachePoint objects in system/messages/tools with TTL support
  • Azure OpenAI: Uses OpenAI interceptor (same prompt_cache_key API)

Key Features

  • Cache-Control: no-cache header disables caching per-request
  • Provider detection via model name patterns
  • Cache usage tracking in ResponseMetadata.Custom["cache_usage"]
  • Org/tenant ID namespacing for multi-tenant cache isolation
  • Auto-derived cache keys from static content prefix (OpenAI)
  • Skip on existing markers - lets users control caching explicitly

Files Changed

File Purpose
interceptors/promptcaching.go Main implementation (~800 lines)
interceptors/promptcaching_test.go Comprehensive tests (~1840 lines)
metadata.go Added CacheUsage, CacheDetail structs
interceptor.go Added OrgID field to MetaContextValue
providers/anthropic/extractor.go Cache usage extraction
providers/bedrock/parser.go CachePoint type
providers/bedrock/extractor.go Cache usage extraction
providers/openai_compatible/extractor.go Cache usage extraction
DESIGN.md Full documentation
README.md Usage examples

Test Coverage

All tests passing across all packages:

ok  	github.com/agentuity/llmproxy
ok  	github.com/agentuity/llmproxy/interceptors
ok  	github.com/agentuity/llmproxy/providers/anthropic
ok  	github.com/agentuity/llmproxy/providers/azure
ok  	github.com/agentuity/llmproxy/providers/bedrock
ok  	github.com/agentuity/llmproxy/providers/openai_compatible

Summary by CodeRabbit

  • New Features

    • Prompt caching added for Anthropic, OpenAI, xAI, Fireworks, and AWS Bedrock with provider-specific controls, retention options, cache key/namespace routing, and org-scoped behavior.
    • Cache usage metrics reporting added (per-provider token details) and new retention presets.
  • Documentation

    • README and docs updated with prompt caching examples and configuration guidance.
  • Tests

    • Comprehensive tests covering prompt caching behavior, header/body mutations, and usage reporting.
  • Other

    • Built-in interceptors count updated to eight.

Add comprehensive prompt caching interceptor with support for:
- Anthropic: cache_control blocks with 5min/1h TTL
- OpenAI: prompt_cache_key + prompt_cache_retention body params
- xAI (Grok): x-grok-conv-id header for cache routing
- Fireworks: x-session-affinity + x-prompt-cache-isolation-key headers
- AWS Bedrock: cachePoint objects in system/messages/tools
- Azure OpenAI: Uses OpenAI interceptor (same API)

Features:
- Cache-Control: no-cache header to disable per-request
- Provider detection via model name patterns
- Cache usage tracking in ResponseMetadata.Custom
- Org/tenant ID namespacing for multi-tenant isolation
- Auto-derived cache keys from static content prefix

Includes extensive test coverage and documentation updates.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

📝 Walkthrough

Walkthrough

Adds a PromptCachingInterceptor and related types/functions for provider-specific prompt caching (Anthropic, OpenAI, xAI, Fireworks, Bedrock); augments response extraction to emit standardized cache usage metadata; adds OrgID to meta context; updates docs and README; includes extensive tests and provider extractor/parser changes.

Changes

Cohort / File(s) Summary
Documentation
DESIGN.md, README.md
Incremented built-in interceptor count to eight and added a comprehensive Prompt Caching section and README examples showing provider-specific prompt caching configurations and usage.
Core context & metadata
interceptor.go, metadata.go
Added OrgID string to MetaContextValue. Introduced CacheUsage and CacheDetail types to represent provider cache token metrics.
Prompt Caching interceptor implementation
interceptors/promptcaching.go
New PromptCachingInterceptor with provider-specific request rewrites (Anthropic, OpenAI, xAI, Fireworks, Bedrock), Cache-Control no-cache bypass, model-family detection, cache-key/namespace/org scoping, retention constants, result callback support, helpers (DeriveCacheKeyFromPrefix, request cloning, org extraction), many constructors, and exported header/constants/types.
Prompt Caching tests
interceptors/promptcaching_test.go
Extensive unit tests exercising header/body mutations, provider-specific injection rules, model detection, cache-key derivation, org extraction, retention/TTL behavior, no-cache/disabled behavior, and result-callback extraction for all supported providers.
Anthropic provider
providers/anthropic/extractor.go, providers/anthropic/parser_test.go
Added CacheCreation and cache-related usage fields to response/usage models; extractor now emits meta.Custom["cache_usage"] when relevant cache fields are present. Added tests for presence/absence of cache usage extraction.
Bedrock provider
providers/bedrock/parser.go, providers/bedrock/extractor.go
Added CachePoint type and cachePoint support in ContentBlock/SystemBlock/Tool; extended ResponseUsage with cache read/write and CacheDetails; extractor maps Bedrock cache metrics into meta.Custom["cache_usage"].
OpenAI-compatible provider
providers/openai_compatible/extractor.go, providers/openai_compatible/parser_test.go
Extended UsageInfo with PromptTokensDetails/CompletionTokensDetails (including cached_tokens); extractor populates meta.Custom["cache_usage"] when cached token details are present. Added tests validating cached-token extraction.
Other files (inventory/structure)
DESIGN.md (dir listings), interceptors/...
Updated directory listings to include new interceptors/promptcaching.go and related Proxy/Proxy module entries; updated interceptor inventory text to eight built-ins.
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
providers/bedrock/extractor.go (1)

59-65: Make Bedrock cache usage emission resilient to detail-only responses.

Consider including len(bedrockResp.Usage.CacheDetails) > 0 in the guard so cache_usage is still emitted if totals are omitted but details are present.

Suggested robustness tweak
-	if bedrockResp.Usage.CacheReadInputTokens > 0 || bedrockResp.Usage.CacheWriteInputTokens > 0 {
+	if bedrockResp.Usage.CacheReadInputTokens > 0 ||
+		bedrockResp.Usage.CacheWriteInputTokens > 0 ||
+		len(bedrockResp.Usage.CacheDetails) > 0 {
 		meta.Custom["cache_usage"] = llmproxy.CacheUsage{
 			CachedTokens:     bedrockResp.Usage.CacheReadInputTokens,
 			CacheWriteTokens: bedrockResp.Usage.CacheWriteInputTokens,
 			CacheDetails:     extractCacheDetails(bedrockResp.Usage.CacheDetails),
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@providers/bedrock/extractor.go` around lines 59 - 65, The guard that sets
meta.Custom["cache_usage"] currently only checks CacheReadInputTokens and
CacheWriteInputTokens on bedrockResp.Usage; change it to also emit cache_usage
when bedrockResp.Usage.CacheDetails is non-empty (e.g., if CacheReadInputTokens
and CacheWriteInputTokens are zero but len(bedrockResp.Usage.CacheDetails) > 0).
Update the conditional around llmproxy.CacheUsage construction (the block
creating llmproxy.CacheUsage and calling extractCacheDetails) to check for
either totals > 0 OR non-empty CacheDetails and handle a nil CacheDetails safely
before calling extractCacheDetails.
interceptors/promptcaching.go (1)

158-168: Consider handling JSON parse errors more gracefully for Fireworks cached tokens.

If the fireworks-cached-prompt-tokens header contains a malformed value, the error is silently ignored. While this is defensive, logging might help with debugging cache issues in production.

💡 Optional: Add debug logging for parse failures
 	if cached := resp.Header.Get("fireworks-cached-prompt-tokens"); cached != "" {
 		if respMeta.Custom == nil {
 			respMeta.Custom = make(map[string]any)
 		}
 		var cachedTokens int
-		if err := json.Unmarshal([]byte(cached), &cachedTokens); err == nil && cachedTokens > 0 {
+		if err := json.Unmarshal([]byte(cached), &cachedTokens); err != nil {
+			// Log parse error if logging is available
+		} else if cachedTokens > 0 {
 			respMeta.Custom["cache_usage"] = llmproxy.CacheUsage{
 				CachedTokens: cachedTokens,
 			}
 		}
 	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@interceptors/promptcaching.go` around lines 158 - 168, The JSON unmarshal of
the "fireworks-cached-prompt-tokens" header in the resp header handling silently
ignores parse errors; update the block around
resp.Header.Get("fireworks-cached-prompt-tokens") / respMeta.Custom to log a
debug/warn when json.Unmarshal fails (include the header value and the error) so
malformed values are visible in logs; use the project's existing logger (or the
standard log package) and keep the behavior of not crashing—only add a
diagnostic log entry alongside the current defensive flow that leaves respMeta
unchanged when parsing fails; reference the resp.Header.Get call,
json.Unmarshal, respMeta.Custom and llmproxy.CacheUsage to locate the code to
change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@providers/openai_compatible/extractor.go`:
- Around line 102-118: PromptTokensDetails currently declares fields that don't
exist in OpenAI's Chat Completions API; update the struct by removing
ImageTokens, ReasoningTokens, AcceptedPredictionTokens, and
RejectedPredictionTokens so PromptTokensDetails only contains CachedTokens and
AudioTokens, leaving the other token fields only in CompletionTokensDetails
(which already holds ReasoningTokens, AcceptedPredictionTokens,
RejectedPredictionTokens) to match the API spec.

In `@README.md`:
- Line 59: Update the "Prompt Caching" bullet (currently "Prompt Caching:
Anthropic and OpenAI prompt caching support") to reflect all supported providers
by changing the text to a more inclusive phrase that lists or groups Anthropic,
OpenAI, xAI, Fireworks, and Bedrock (e.g., "Prompt Caching: prompt caching
support for Anthropic, OpenAI, xAI, Fireworks, and Bedrock"), ensuring the
README entry matches the new PR documentation.

---

Nitpick comments:
In `@interceptors/promptcaching.go`:
- Around line 158-168: The JSON unmarshal of the
"fireworks-cached-prompt-tokens" header in the resp header handling silently
ignores parse errors; update the block around
resp.Header.Get("fireworks-cached-prompt-tokens") / respMeta.Custom to log a
debug/warn when json.Unmarshal fails (include the header value and the error) so
malformed values are visible in logs; use the project's existing logger (or the
standard log package) and keep the behavior of not crashing—only add a
diagnostic log entry alongside the current defensive flow that leaves respMeta
unchanged when parsing fails; reference the resp.Header.Get call,
json.Unmarshal, respMeta.Custom and llmproxy.CacheUsage to locate the code to
change.

In `@providers/bedrock/extractor.go`:
- Around line 59-65: The guard that sets meta.Custom["cache_usage"] currently
only checks CacheReadInputTokens and CacheWriteInputTokens on bedrockResp.Usage;
change it to also emit cache_usage when bedrockResp.Usage.CacheDetails is
non-empty (e.g., if CacheReadInputTokens and CacheWriteInputTokens are zero but
len(bedrockResp.Usage.CacheDetails) > 0). Update the conditional around
llmproxy.CacheUsage construction (the block creating llmproxy.CacheUsage and
calling extractCacheDetails) to check for either totals > 0 OR non-empty
CacheDetails and handle a nil CacheDetails safely before calling
extractCacheDetails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c24cecd6-aaec-479d-90d8-6c0da645787a

📥 Commits

Reviewing files that changed from the base of the PR and between 0c8762d and 1c37600.

📒 Files selected for processing (12)
  • DESIGN.md
  • README.md
  • interceptor.go
  • interceptors/promptcaching.go
  • interceptors/promptcaching_test.go
  • metadata.go
  • providers/anthropic/extractor.go
  • providers/anthropic/parser_test.go
  • providers/bedrock/extractor.go
  • providers/bedrock/parser.go
  • providers/openai_compatible/extractor.go
  • providers/openai_compatible/parser_test.go
📜 Review details
🔇 Additional comments (22)
interceptor.go (1)

73-73: MetaContextValue.OrgID addition is clean and backward-compatible.

This is a safe additive change and aligns with org-scoped caching needs.

providers/openai_compatible/parser_test.go (1)

497-559: Cache usage extractor tests are well-targeted.

Good coverage of cached_tokens > 0, missing field, and explicit zero behavior.

providers/anthropic/parser_test.go (1)

158-202: Anthropic cache usage test additions look good.

These tests correctly validate both populated and non-populated meta.Custom["cache_usage"] paths.

README.md (1)

105-121: Provider-specific interceptor examples are clear and practical.

Nice addition—these examples make adoption much easier.

DESIGN.md (1)

122-140: Design documentation updates are comprehensive and aligned with implementation scope.

The new cache metadata and prompt-caching sections are detailed and coherent.

Also applies to: 298-299, 437-649, 739-739

metadata.go (1)

59-83: CacheUsage / CacheDetail modeling is solid.

Good unified schema for cross-provider cache accounting in response metadata.

providers/bedrock/parser.go (1)

108-115: Bedrock request model extension for cachePoint is correctly integrated.

This is a clean additive change and fits Bedrock caching injection behavior.

Also applies to: 150-152, 170-172

providers/bedrock/extractor.go (1)

70-82: cacheDetails mapping and response usage extensions look good.

The conversion into llmproxy.CacheDetail is straightforward and maintainable.

Also applies to: 117-129

providers/anthropic/extractor.go (2)

45-55: LGTM! Cache usage extraction is correctly implemented.

The logic properly:

  • Initializes CacheUsage with top-level cache token counts
  • Conditionally populates ephemeral token breakdown when CacheCreation is present
  • Only writes to meta.Custom when meaningful cache activity occurred (tokens > 0)

102-114: Field names verified against Anthropic API documentation.

The JSON tags for CacheCreationInfo fields (ephemeral_5m_input_tokens and ephemeral_1h_input_tokens) are correct and match the current Anthropic Claude API specification for prompt caching cache_creation response objects.

providers/openai_compatible/extractor.go (1)

49-53: LGTM! OpenAI cache usage extraction is correctly implemented.

The conditional check properly handles the optional PromptTokensDetails and only populates cache_usage when cached tokens are present and greater than zero.

interceptors/promptcaching_test.go (4)

16-71: LGTM! Anthropic system string transformation test is well-structured.

The test correctly validates:

  • System string is converted to array format with cache_control block
  • The cache_control is placed directly on the content block
  • Text content is preserved during transformation

266-306: LGTM! OpenAI cache usage callback test validates the result forwarding pattern.

The test correctly verifies that cache_usage from response metadata is forwarded to the onResult callback when using NewOpenAIPromptCachingWithResult.


1401-1435: LGTM! Fireworks cache usage extraction test validates header parsing.

The test correctly verifies that fireworks-cached-prompt-tokens response header is parsed and forwarded via the onResult callback.


1793-1840: LGTM! Bedrock cache usage callback test validates the result forwarding pattern.

The test correctly verifies that cache usage with CacheDetails is properly forwarded through the interceptor's onResult callback.

interceptors/promptcaching.go (7)

50-115: LGTM! Main intercept method has clean control flow.

The method properly:

  • Short-circuits when disabled or Cache-Control: no-cache is present
  • Routes to provider-specific handlers
  • Handles response metadata and onResult callback forwarding

217-304: LGTM! Anthropic cache_control injection handles all content formats correctly.

The implementation properly:

  • Converts string system to array format with cache_control
  • Adds cache_control to the last block of array system
  • Handles both string and array message content formats
  • Only modifies the last eligible (user/assistant) message

550-570: LGTM! OrgID extraction has clear priority ordering.

The extraction chain correctly prioritizes:

  1. Custom OrgIDExtractor function
  2. Context metadata (MetaContextValue.OrgID)
  3. Request header (X-Org-ID)
  4. Body metadata custom field (org_id)
  5. Configured namespace (fallback)

582-615: LGTM! Cache key derivation from prefix is well-designed.

The function correctly:

  • Builds a deterministic hash from system prompt, tools, and prior messages
  • Excludes the last message (typically the new user input)
  • Returns empty string for invalid/empty input
  • Uses SHA-256 truncated to 128 bits which provides sufficient uniqueness for cache keys

617-638: LGTM! Model detection helpers are comprehensive.

The substring-based detection correctly identifies:

  • OpenAI models including future o4- series
  • xAI Grok models
  • Fireworks models (both direct and accounts/ prefix)
  • Bedrock models (Anthropic Claude, Amazon Nova/Titan)

787-798: LGTM! DefaultOrgIDExtractor provides a reusable extraction function.

This exported function allows users to compose custom extractors while leveraging the default lookup chain. The priority order (context → header → meta.Custom) is consistent with the internal extractOrgID method.


399-412: Implementation correctly matches AWS Bedrock's cachePoint format.

The code correctly appends cachePoint as a separate block in the system array, which aligns with AWS Bedrock's documented system prompt structure where the cache point is added as a distinct object following the text prompt.

Comment thread providers/openai_compatible/extractor.go
Comment thread README.md Outdated
jhaynie added 2 commits April 12, 2026 21:22
- Remove incorrect fields from PromptTokensDetails (only cached_tokens and audio_tokens per OpenAI spec)
- Update README to list all prompt caching providers
- Include CacheDetails check in Bedrock cache_usage emission
- Add CacheKeyExtractor type for dynamic extraction from request context
- Add TraceIDCacheKeyExtractor to use OTEL trace ID as cache key
- Add NewXAIPromptCachingAuto, NewXAIPromptCachingWithTraceID constructors
- Add NewFireworksPromptCachingAuto, NewFireworksPromptCachingWithTraceID constructors
- Support X-Cache-Key header override for all providers
- Add resolveDynamicCacheKey with priority: header > extractor > CacheKeyFn > CacheKey
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
interceptors/promptcaching_test.go (1)

1228-1262: Consider adding test case for MetaContext.OrgID extraction path.

The TestDefaultOrgIDExtractor test doesn't verify the MetaContext.OrgID extraction path (line 898-900 in implementation). While the "none" case passes because no sources are set, there's no explicit test that MetaContext.OrgID takes precedence when set.

🧪 Optional: Add test case for MetaContext precedence
 func TestDefaultOrgIDExtractor(t *testing.T) {
 	tests := []struct {
 		name     string
 		setup    func(*http.Request, *llmproxy.BodyMetadata)
+		ctxSetup func(context.Context) context.Context
 		expected string
 	}{
+		{
+			name:  "from meta context",
+			setup: func(*http.Request, *llmproxy.BodyMetadata) {},
+			ctxSetup: func(ctx context.Context) context.Context {
+				return llmproxy.WithMeta(ctx, llmproxy.MetaContextValue{OrgID: "org-context"})
+			},
+			expected: "org-context",
+		},
 		{
 			name:     "from header",
 			setup:    func(req *http.Request, _ *llmproxy.BodyMetadata) { req.Header.Set(HeaderOrgID, "org-header") },
+			ctxSetup: nil,
 			expected: "org-header",
 		},
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@interceptors/promptcaching_test.go` around lines 1228 - 1262, Add a new test
case in TestDefaultOrgIDExtractor to cover the MetaContext.OrgID extraction
path: set req.Context() to include a metadata context whose MetaContext.OrgID is
populated (e.g., using llmproxy.WithMeta or whatever context helper you have)
and assert that DefaultOrgIDExtractor(ctx, req, meta) returns that OrgID even
when other sources are empty; reference DefaultOrgIDExtractor and
MetaContext.OrgID so the new case validates the precedence of MetaContext over
header/meta.Custom.
interceptors/promptcaching.go (1)

608-641: Consider logging or handling the JSON unmarshal error.

The error from json.Unmarshal at line 617 is silently ignored. While this degrades gracefully (producing an empty key when parsing fails), it could hide issues with malformed request bodies during debugging.

🔧 Optional: Add error logging for debugging
 func DeriveCacheKeyFromPrefix(meta llmproxy.BodyMetadata, rawBody []byte) string {
 	var body struct {
 		System   interface{} `json:"system"`
 		Messages []struct {
 			Role    string      `json:"role"`
 			Content interface{} `json:"content"`
 		} `json:"messages"`
 		Tools interface{} `json:"tools"`
 	}
-	json.Unmarshal(rawBody, &body)
+	if err := json.Unmarshal(rawBody, &body); err != nil {
+		// Parsing failed - return empty key to skip caching
+		return ""
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@interceptors/promptcaching.go` around lines 608 - 641,
DeriveCacheKeyFromPrefix currently ignores the error returned by json.Unmarshal
on rawBody; update the function to capture the error from json.Unmarshal and log
or handle it (e.g., return empty key after logging) so malformed JSON doesn't
silently fail. Specifically, in DeriveCacheKeyFromPrefix, check the error from
json.Unmarshal(rawBody, &body), and use the existing logging facility (or a
passed logger) to emit the error with context (include rawBody or truncated
bytes if needed) before returning "", ensuring no panic and preserving the
current graceful behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@interceptors/promptcaching_test.go`:
- Around line 1228-1262: Add a new test case in TestDefaultOrgIDExtractor to
cover the MetaContext.OrgID extraction path: set req.Context() to include a
metadata context whose MetaContext.OrgID is populated (e.g., using
llmproxy.WithMeta or whatever context helper you have) and assert that
DefaultOrgIDExtractor(ctx, req, meta) returns that OrgID even when other sources
are empty; reference DefaultOrgIDExtractor and MetaContext.OrgID so the new case
validates the precedence of MetaContext over header/meta.Custom.

In `@interceptors/promptcaching.go`:
- Around line 608-641: DeriveCacheKeyFromPrefix currently ignores the error
returned by json.Unmarshal on rawBody; update the function to capture the error
from json.Unmarshal and log or handle it (e.g., return empty key after logging)
so malformed JSON doesn't silently fail. Specifically, in
DeriveCacheKeyFromPrefix, check the error from json.Unmarshal(rawBody, &body),
and use the existing logging facility (or a passed logger) to emit the error
with context (include rawBody or truncated bytes if needed) before returning "",
ensuring no panic and preserving the current graceful behavior.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 658e8c4a-6fc6-460e-bbf3-f0c7b45ce970

📥 Commits

Reviewing files that changed from the base of the PR and between b58b178 and a4b5e88.

📒 Files selected for processing (2)
  • interceptors/promptcaching.go
  • interceptors/promptcaching_test.go
📜 Review details
🔇 Additional comments (19)
interceptors/promptcaching.go (10)

1-51: LGTM!

Type definitions, constants, and config structures are well-organized. The function types (CacheKeyFunc, CacheKeyExtractor, OrgIDExtractor) provide good flexibility for different cache key derivation strategies.


53-118: LGTM!

The main Intercept method has clean control flow with proper early returns for disabled state and Cache-Control: no-cache. Provider routing is well-organized, and the body cloning pattern correctly manages the request body lifecycle.


120-144: LGTM!

The xAI interceptor correctly handles the idempotency check for existing headers and properly resolves dynamic cache keys before setting the x-grok-conv-id header.


146-183: LGTM!

The Fireworks interceptor correctly handles both isolation and session affinity headers with proper idempotency checks. The response header parsing for cached tokens is correctly implemented with proper nil-map initialization.


185-208: LGTM!

The Bedrock interceptor follows the same clean pattern as other providers, delegating body modification to checkBedrock and handling the response callback consistently.


223-310: LGTM!

The Anthropic body modification correctly handles both string and array formats for system prompts and message content. The backward iteration through messages ensures the cache control is placed on the last user/assistant message as expected by Anthropic's API.


393-475: LGTM!

The Bedrock implementation correctly appends cachePoint as a separate content block (rather than embedding it), which aligns with the AWS Bedrock Converse API design where cache points are standalone block types in the content arrays.


643-664: LGTM!

The model detection functions use reasonable string patterns for each provider. The approach is flexible enough to handle variations while being specific enough to avoid common false positives.


673-908: LGTM!

The comprehensive set of constructors provides a clean API for various use cases. The consistent naming convention (New<Provider>PromptCaching...) makes the API discoverable, and the delegation pattern keeps implementation DRY.


910-921: No action needed. TraceExtractor and TraceInfo are properly defined in interceptors/tracing.go (lines 23 and 12 respectively) within the same package.

			> Likely an incorrect or invalid review comment.
interceptors/promptcaching_test.go (9)

17-200: LGTM!

Anthropic tests provide comprehensive coverage of system prompt handling (string vs array), message content modification, and TTL variants. The upstream server handlers correctly validate the transformed request bodies.


202-307: LGTM!

OpenAI tests cover cache key insertion, retention settings, and the cache usage callback mechanism. The cache usage test correctly simulates the response metadata flow from upstream extractors.


309-508: LGTM!

Excellent coverage of skip and bypass scenarios including model mismatch, existing cache markers, disabled state, error passthrough, and Cache-Control: no-cache handling. The idempotency tests correctly verify that cache markers aren't duplicated.


543-735: LGTM!

Well-structured table-driven unit tests for helper functions. The TestCheckOpenAI test comprehensively covers combinations of cache key and retention settings.


900-1073: LGTM!

xAI tests provide good coverage including header injection, model filtering, existing header preservation, and Cache-Control: no-cache handling.


1264-1489: LGTM!

Fireworks tests comprehensively cover both request header injection and response header parsing for cache metrics. The cache usage test correctly validates that the interceptor parses the fireworks-cached-prompt-tokens response header.


1492-1791: LGTM!

Bedrock tests provide excellent coverage across all three insertion points (system, messages, tools), TTL variants, and cache usage extraction. The tests correctly verify the appended cachePoint block structure.


1843-1983: LGTM!

Trace ID and auto-derive tests comprehensively verify the dynamic cache key extraction mechanisms for both xAI and Fireworks. The hex encoding of trace IDs is correctly validated.


1985-2047: LGTM!

Header override tests correctly verify that the X-Cache-Key header takes precedence over constructor-provided keys for both xAI and Fireworks providers. This ensures per-request override capability works as expected.

@jhaynie jhaynie merged commit 681061e into main Apr 13, 2026
2 checks passed
@jhaynie jhaynie deleted the feature/prompt-caching-interceptor branch April 13, 2026 02:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant