-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Hi,
There is a critical token counting bug in src/Ai/Infrastructure/Services/Anthropic/MessageService.php
Note: CompletionService.php & CodeCompletionService.php are the same pattern.
that causes users to be overcharged for Anthropic models (claude-opus-4-6, claude-sonnet-4-6, etc.).
The Problem
In the streaming response loop (lines 98–155), the code accumulates tokens from both
message_start and message_delta events:
// Line 105-108
if ($type == 'message_start') {
$inputTokensCount += $data->message->usage->input_tokens ?? 0;
$outputTokensCount += $data->message->usage->output_tokens ?? 0;
continue;
}
// Line 150-154
if ($type == 'message_delta') {
$inputTokensCount += $data->usage->input_tokens ?? 0;
$outputTokensCount += $data->usage->output_tokens ?? 0;
continue;
}
Why This Is Wrong
According to Anthropic's streaming API behavior (verified with live API tests):
input_tokens: Reported with the SAME value in BOTHmessage_startandmessage_delta.
Adding both results in EXACTLY 2x the actual input token count.output_tokens:message_startreports a small initial estimate (~2-4 tokens),
whilemessage_deltareports the FINAL total. Adding both results in slight overcounting.
Test Evidence (live API call)
CALL 1 (non-streaming): input=560, output=57
CALL 2 (streaming):
message_start: input_tokens=637, output_tokens=2
message_delta: input_tokens=637, output_tokens=127
Buggy calculation:
Input: (560+560) + (637+637) = 2,394 (actual: 1,197 → 2.0x overcount)
Output: (2+57) + (2+127) = 188 (actual: 184 → 1.02x overcount)
Since input tokens typically represent 70-80% of the cost, users are being charged
approximately 1.7-1.8x the actual cost for Anthropic models.
Fix
Only take input_tokens from message_start and output_tokens from message_delta:
if ($type == 'message_start') {
$inputTokensCount += $data->message->usage->input_tokens ?? 0;
// Do NOT add output_tokens here - message_start only has an initial estimate
continue;
}
if ($type == 'message_delta') {
// Do NOT add input_tokens here - it's a duplicate of message_start
$outputTokensCount += $data->usage->output_tokens ?? 0;
continue;
}
This also applies to each iteration of the tool call loop (while loop), since every
API call in the loop produces its own message_start + message_delta pair.
Best regards