feat: split billing for cached vs non-cached prompt tokens by jhaynie · Pull Request #4 · agentuity/llmproxy

jhaynie · 2026-04-13T04:11:53Z

Summary

CalculateCost now correctly bills cached prompt tokens at the CacheRead rate instead of the full Input rate, reflecting the actual cost savings from prompt caching.

Changes

billing.go

CalculateCost accepts new *CacheUsage parameter
Cached tokens (OpenAI CachedTokens, Anthropic CacheReadInputTokens, Bedrock) billed at CacheRead rate
Non-cached prompt tokens billed at full Input rate
Falls back to full Input rate if no CacheRead pricing available
BillingResult now includes CachedTokens and CachedInputCost fields

interceptors/billing.go

Extracts cache_usage from ResponseMetadata.Custom (already set by OpenAI, Anthropic, Bedrock, Fireworks extractors)
Passes *CacheUsage to CalculateCost

Pricing Example (gpt-4o, 2000 prompt tokens, 1920 cached)

	Before	After
Input cost	2000 × $3.00/M = $0.0060	80 × $3.00/M = $0.0002
Cache cost	—	1920 × $1.50/M = $0.0029
Total	$0.0060	$0.0031 (48% savings)

Breaking Change

CalculateCost signature changed — callers must pass *CacheUsage (or nil for no caching):

// Before
result := llmproxy.CalculateCost(provider, model, costInfo, promptTokens, completionTokens)

// After
result := llmproxy.CalculateCost(provider, model, costInfo, promptTokens, completionTokens, cacheUsage)

Summary by CodeRabbit

New Features
- Billing now reports cached token counts, separates cached vs non-cached prompt input costs, and includes cached input cost in total billing for clearer cost breakdowns.
Tests
- Added tests covering cache-aware billing, clamping of cached tokens, provider cache-price fallbacks, and edge-case scenarios (nil/zero/mixed cache usage).

CalculateCost now accepts CacheUsage and bills cached tokens at the CacheRead rate instead of the full Input rate. This correctly reflects the cost savings from prompt caching (typically 50-90% cheaper). Changes: - BillingResult: added CachedTokens and CachedInputCost fields - CalculateCost: accepts *CacheUsage, splits cached/non-cached pricing - BillingInterceptor: extracts cache_usage from ResponseMetadata.Custom and passes it to CalculateCost

coderabbitai · 2026-04-13T04:12:06Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1756887e-167d-44ad-8d0e-931279719043

📥 Commits

Reviewing files that changed from the base of the PR and between 0db9e51 and b984fbe.

📒 Files selected for processing (1)

billing.go

📝 Walkthrough

Walkthrough

CalculateCost now accepts optional CacheUsage and tracks cached prompt tokens and cached input cost in BillingResult; cached tokens are clamped to prompt tokens, split from non-cached tokens, and billed using cache-read pricing when available. BillingInterceptor extracts cache usage from response metadata and forwards it. New unit tests verify behaviors.

Changes

Cohort / File(s)	Summary
Core billing logic `billing.go`	Updated `BillingResult` with `CachedTokens` and `CachedInputCost`. Changed `CalculateCost(..., cacheUsage *CacheUsage)` signature. Computes `cachedTokens` (sum of cache fields, clamped to prompt tokens), `nonCachedTokens`, recalculates `InputCost` for non-cached tokens, computes `CachedInputCost` using `costInfo.CacheRead` when present, and includes cached input cost in `TotalCost`. Token totals remain `promptTokens + completionTokens`.
Interceptor integration `interceptors/billing.go`	`BillingInterceptor.Intercept` now attempts to read `respMeta.Custom["cache_usage"]` as an optional `llmproxy.CacheUsage` and passes it into `CalculateCost`, preserving existing billing control flow.
Unit tests `billing_test.go`, `interceptors/billing_test.go`	Added comprehensive tests for `CalculateCost` with nil/zero/excess cache usage, provider-specific cache-price fallback, mixed cache fields (including Anthropic), and interceptor tests for custom `cache_usage` present, zero, or absent.

🚥 Pre-merge checks | ✅ 1

✅ Passed checks (1 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

billing.go (1)
50-61: Cached token normalization assumes mutual exclusivity.

The addition cacheUsage.CachedTokens + cacheUsage.CacheReadInputTokens relies on each provider populating only one of these fields. While the clamping logic (lines 59-61) provides a safety net, if a provider or future API change populates both fields, cached tokens would be initially double-counted before clamping silently corrects it.

This is likely fine given current provider behaviors, but consider adding a brief comment noting this assumption for future maintainers.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@billing.go` around lines 50 - 61, The code currently sums
cacheUsage.CachedTokens and cacheUsage.CacheReadInputTokens into cachedTokens
which assumes those fields are mutually exclusive; add a brief explanatory
comment above that line (near where cachedTokens is computed) stating the
assumption that providers populate only one of CachedTokens or
CacheReadInputTokens and that we clamp against promptTokens as a safety net—this
helps future maintainers understand why the sum is used and that clamping
prevents overcounting if both fields are ever set.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@billing.go`:
- Around line 50-61: The code currently sums cacheUsage.CachedTokens and
cacheUsage.CacheReadInputTokens into cachedTokens which assumes those fields are
mutually exclusive; add a brief explanatory comment above that line (near where
cachedTokens is computed) stating the assumption that providers populate only
one of CachedTokens or CacheReadInputTokens and that we clamp against
promptTokens as a safety net—this helps future maintainers understand why the
sum is used and that clamping prevents overcounting if both fields are ever set.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f3c23aad-4492-4551-a83b-edb402ac9848

📥 Commits

Reviewing files that changed from the base of the PR and between 0d87e7c and ed83477.

📒 Files selected for processing (2)

billing.go
interceptors/billing.go

📜 Review details

🔇 Additional comments (5)

billing.go (4)

32-44: LGTM!

The BillingResult struct extensions are well-documented. The new CachedTokens and CachedInputCost fields clearly separate cached token accounting from non-cached, and the updated comments accurately reflect the new semantics.

68-76: LGTM!

The fallback logic for cache pricing is well-implemented. Using cacheRate <= 0 correctly handles both missing (zero-value) and invalid (negative) cache pricing by falling back to the full input rate, which is a sensible default that prevents undercharging.

78-91: LGTM!

The total cost calculation correctly aggregates non-cached input, cached input, and output costs. The BillingResult is properly populated with all fields.

49-49: No action needed. The CacheUsage type is properly defined in metadata.go with all expected fields (CachedTokens, CacheReadInputTokens, etc.) and is accessible within the package. The function signature is correct.
interceptors/billing.go (1)
41-45: Type assertion is correct—no issue found.

All providers (openai_compatible, bedrock, anthropic, promptcaching) consistently store cache_usage in respMeta.Custom as a value type llmproxy.CacheUsage, never as a pointer. The code in billing.go correctly asserts against the value type cu.(llmproxy.CacheUsage), and no silent failure occurs.
			> Likely an incorrect or invalid review comment.

Tests cover: - No cache usage (nil) — all tokens at full rate - OpenAI cache hit (CachedTokens field) - Anthropic cache hit (CacheReadInputTokens field) - Cache usage present but zero tokens — treated as no caching - Cached tokens exceeding prompt tokens — clamped - No CacheRead price — falls back to full input rate - All tokens cached — zero non-cached cost - Zero tokens — zero cost - Mixed provider cache fields (summed) - Interceptor extracting cache_usage from ResponseMetadata.Custom - Interceptor with nil/empty Custom map

… CalculateCost

coderabbitai Bot reviewed Apr 13, 2026

View reviewed changes

jhaynie added 2 commits April 12, 2026 23:14

docs: clarify mutual-exclusivity assumption and clamping rationale in…

b984fbe

… CalculateCost

jhaynie merged commit 12ae01a into main Apr 13, 2026
1 check passed

jhaynie deleted the task/aigateway-integration branch April 13, 2026 04:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: split billing for cached vs non-cached prompt tokens#4

feat: split billing for cached vs non-cached prompt tokens#4
jhaynie merged 3 commits intomainfrom
task/aigateway-integration

jhaynie commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jhaynie commented Apr 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Pricing Example (gpt-4o, 2000 prompt tokens, 1920 cached)

Breaking Change

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jhaynie commented Apr 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Apr 13, 2026 •

edited

Loading