Skip to content

feat: split billing for cached vs non-cached prompt tokens#4

Merged
jhaynie merged 3 commits intomainfrom
task/aigateway-integration
Apr 13, 2026
Merged

feat: split billing for cached vs non-cached prompt tokens#4
jhaynie merged 3 commits intomainfrom
task/aigateway-integration

Conversation

@jhaynie
Copy link
Copy Markdown
Member

@jhaynie jhaynie commented Apr 13, 2026

Summary

CalculateCost now correctly bills cached prompt tokens at the CacheRead rate instead of the full Input rate, reflecting the actual cost savings from prompt caching.

Changes

billing.go

  • CalculateCost accepts new *CacheUsage parameter
  • Cached tokens (OpenAI CachedTokens, Anthropic CacheReadInputTokens, Bedrock) billed at CacheRead rate
  • Non-cached prompt tokens billed at full Input rate
  • Falls back to full Input rate if no CacheRead pricing available
  • BillingResult now includes CachedTokens and CachedInputCost fields

interceptors/billing.go

  • Extracts cache_usage from ResponseMetadata.Custom (already set by OpenAI, Anthropic, Bedrock, Fireworks extractors)
  • Passes *CacheUsage to CalculateCost

Pricing Example (gpt-4o, 2000 prompt tokens, 1920 cached)

Before After
Input cost 2000 × $3.00/M = $0.0060 80 × $3.00/M = $0.0002
Cache cost 1920 × $1.50/M = $0.0029
Total $0.0060 $0.0031 (48% savings)

Breaking Change

CalculateCost signature changed — callers must pass *CacheUsage (or nil for no caching):

// Before
result := llmproxy.CalculateCost(provider, model, costInfo, promptTokens, completionTokens)

// After
result := llmproxy.CalculateCost(provider, model, costInfo, promptTokens, completionTokens, cacheUsage)

Summary by CodeRabbit

  • New Features

    • Billing now reports cached token counts, separates cached vs non-cached prompt input costs, and includes cached input cost in total billing for clearer cost breakdowns.
  • Tests

    • Added tests covering cache-aware billing, clamping of cached tokens, provider cache-price fallbacks, and edge-case scenarios (nil/zero/mixed cache usage).

CalculateCost now accepts CacheUsage and bills cached tokens at the
CacheRead rate instead of the full Input rate. This correctly reflects
the cost savings from prompt caching (typically 50-90% cheaper).

Changes:
- BillingResult: added CachedTokens and CachedInputCost fields
- CalculateCost: accepts *CacheUsage, splits cached/non-cached pricing
- BillingInterceptor: extracts cache_usage from ResponseMetadata.Custom
  and passes it to CalculateCost
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 13, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1756887e-167d-44ad-8d0e-931279719043

📥 Commits

Reviewing files that changed from the base of the PR and between 0db9e51 and b984fbe.

📒 Files selected for processing (1)
  • billing.go

📝 Walkthrough

Walkthrough

CalculateCost now accepts optional CacheUsage and tracks cached prompt tokens and cached input cost in BillingResult; cached tokens are clamped to prompt tokens, split from non-cached tokens, and billed using cache-read pricing when available. BillingInterceptor extracts cache usage from response metadata and forwards it. New unit tests verify behaviors.

Changes

Cohort / File(s) Summary
Core billing logic
billing.go
Updated BillingResult with CachedTokens and CachedInputCost. Changed CalculateCost(..., cacheUsage *CacheUsage) signature. Computes cachedTokens (sum of cache fields, clamped to prompt tokens), nonCachedTokens, recalculates InputCost for non-cached tokens, computes CachedInputCost using costInfo.CacheRead when present, and includes cached input cost in TotalCost. Token totals remain promptTokens + completionTokens.
Interceptor integration
interceptors/billing.go
BillingInterceptor.Intercept now attempts to read respMeta.Custom["cache_usage"] as an optional llmproxy.CacheUsage and passes it into CalculateCost, preserving existing billing control flow.
Unit tests
billing_test.go, interceptors/billing_test.go
Added comprehensive tests for CalculateCost with nil/zero/excess cache usage, provider-specific cache-price fallback, mixed cache fields (including Anthropic), and interceptor tests for custom cache_usage present, zero, or absent.
🚥 Pre-merge checks | ✅ 1
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
billing.go (1)

50-61: Cached token normalization assumes mutual exclusivity.

The addition cacheUsage.CachedTokens + cacheUsage.CacheReadInputTokens relies on each provider populating only one of these fields. While the clamping logic (lines 59-61) provides a safety net, if a provider or future API change populates both fields, cached tokens would be initially double-counted before clamping silently corrects it.

This is likely fine given current provider behaviors, but consider adding a brief comment noting this assumption for future maintainers.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@billing.go` around lines 50 - 61, The code currently sums
cacheUsage.CachedTokens and cacheUsage.CacheReadInputTokens into cachedTokens
which assumes those fields are mutually exclusive; add a brief explanatory
comment above that line (near where cachedTokens is computed) stating the
assumption that providers populate only one of CachedTokens or
CacheReadInputTokens and that we clamp against promptTokens as a safety net—this
helps future maintainers understand why the sum is used and that clamping
prevents overcounting if both fields are ever set.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@billing.go`:
- Around line 50-61: The code currently sums cacheUsage.CachedTokens and
cacheUsage.CacheReadInputTokens into cachedTokens which assumes those fields are
mutually exclusive; add a brief explanatory comment above that line (near where
cachedTokens is computed) stating the assumption that providers populate only
one of CachedTokens or CacheReadInputTokens and that we clamp against
promptTokens as a safety net—this helps future maintainers understand why the
sum is used and that clamping prevents overcounting if both fields are ever set.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f3c23aad-4492-4551-a83b-edb402ac9848

📥 Commits

Reviewing files that changed from the base of the PR and between 0d87e7c and ed83477.

📒 Files selected for processing (2)
  • billing.go
  • interceptors/billing.go
📜 Review details
🔇 Additional comments (5)
billing.go (4)

32-44: LGTM!

The BillingResult struct extensions are well-documented. The new CachedTokens and CachedInputCost fields clearly separate cached token accounting from non-cached, and the updated comments accurately reflect the new semantics.


68-76: LGTM!

The fallback logic for cache pricing is well-implemented. Using cacheRate <= 0 correctly handles both missing (zero-value) and invalid (negative) cache pricing by falling back to the full input rate, which is a sensible default that prevents undercharging.


78-91: LGTM!

The total cost calculation correctly aggregates non-cached input, cached input, and output costs. The BillingResult is properly populated with all fields.


49-49: No action needed. The CacheUsage type is properly defined in metadata.go with all expected fields (CachedTokens, CacheReadInputTokens, etc.) and is accessible within the package. The function signature is correct.

interceptors/billing.go (1)

41-45: Type assertion is correct—no issue found.

All providers (openai_compatible, bedrock, anthropic, promptcaching) consistently store cache_usage in respMeta.Custom as a value type llmproxy.CacheUsage, never as a pointer. The code in billing.go correctly asserts against the value type cu.(llmproxy.CacheUsage), and no silent failure occurs.

			> Likely an incorrect or invalid review comment.

jhaynie added 2 commits April 12, 2026 23:14
Tests cover:
- No cache usage (nil) — all tokens at full rate
- OpenAI cache hit (CachedTokens field)
- Anthropic cache hit (CacheReadInputTokens field)
- Cache usage present but zero tokens — treated as no caching
- Cached tokens exceeding prompt tokens — clamped
- No CacheRead price — falls back to full input rate
- All tokens cached — zero non-cached cost
- Zero tokens — zero cost
- Mixed provider cache fields (summed)
- Interceptor extracting cache_usage from ResponseMetadata.Custom
- Interceptor with nil/empty Custom map
@jhaynie jhaynie merged commit 12ae01a into main Apr 13, 2026
1 check passed
@jhaynie jhaynie deleted the task/aigateway-integration branch April 13, 2026 04:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant