-
-
Notifications
You must be signed in to change notification settings - Fork 457
bug: extractOpenAIUsage subtracts cached tokens from input_tokens, breaking downstream compaction #490
Description
Description
extractOpenAIUsage in the OpenAI→Claude response translator (internal/translator/openai/claude/openai_claude_response.go:722-740) subtracts cached_tokens from prompt_tokens before reporting input_tokens to downstream clients.
This causes clients that rely on input_tokens for context window tracking (Claude Code, Factory Droid, etc.) to see near-zero values on cache hits, breaking compaction triggers.
Example
API returns: prompt_tokens=150000, cached_tokens=149900
Current behavior: input_tokens=100 (subtracted), cache_read_input_tokens=149900
Expected behavior: input_tokens=150000, cache_read_input_tokens=149900
Code
// internal/translator/openai/claude/openai_claude_response.go
func extractOpenAIUsage(usage gjson.Result) (int64, int64, int64) {
inputTokens := usage.Get("prompt_tokens").Int()
outputTokens := usage.Get("completion_tokens").Int()
cachedTokens := usage.Get("prompt_tokens_details.cached_tokens").Int()
// BUG: this subtraction causes downstream clients to see near-zero input_tokens
if cachedTokens > 0 {
if inputTokens >= cachedTokens {
inputTokens -= cachedTokens
} else {
inputTokens = 0
}
}
return inputTokens, outputTokens, cachedTokens
}Fix
Remove the subtraction block. input_tokens should report the full prompt_tokens value. cache_read_input_tokens is already set separately for clients that need the breakdown.
Impact
- Affects BYOK setups where Claude Code or similar clients talk to non-Claude upstreams (Codex, Copilot) through the proxy
- Claude Code's auto-compaction never fires because
usage.input_tokensappears near-zero - Context window grows unbounded until hitting the model's hard limit
- Related to token under-reporting when translating from Anthropic to OpenAI CLIProxyAPI#2281 (same class of bug in the Chat Completions path)
Version
CLIProxyAPIPlus v6.9.10-1-plus (commit 516d22c)