Skip to content

Bug Report – Token Double-Counting in Anthropic MessageService (Streaming) #3

@xinpehr

Description

@xinpehr

Hi,
There is a critical token counting bug in src/Ai/Infrastructure/Services/Anthropic/MessageService.php
Note: CompletionService.php & CodeCompletionService.php are the same pattern.

that causes users to be overcharged for Anthropic models (claude-opus-4-6, claude-sonnet-4-6, etc.).

The Problem

In the streaming response loop (lines 98–155), the code accumulates tokens from both
message_start and message_delta events:
// Line 105-108
if ($type == 'message_start') {
$inputTokensCount += $data->message->usage->input_tokens ?? 0;
$outputTokensCount += $data->message->usage->output_tokens ?? 0;
continue;
}
// Line 150-154
if ($type == 'message_delta') {
$inputTokensCount += $data->usage->input_tokens ?? 0;
$outputTokensCount += $data->usage->output_tokens ?? 0;
continue;
}

Why This Is Wrong

According to Anthropic's streaming API behavior (verified with live API tests):

  1. input_tokens: Reported with the SAME value in BOTH message_start and message_delta.
    Adding both results in EXACTLY 2x the actual input token count.
  2. output_tokens: message_start reports a small initial estimate (~2-4 tokens),
    while message_delta reports the FINAL total. Adding both results in slight overcounting.

Test Evidence (live API call)

CALL 1 (non-streaming): input=560, output=57

CALL 2 (streaming):
  message_start: input_tokens=637, output_tokens=2
  message_delta:  input_tokens=637, output_tokens=127

Buggy calculation:
  Input:  (560+560) + (637+637) = 2,394   (actual: 1,197 → 2.0x overcount)
  Output: (2+57) + (2+127)      = 188     (actual: 184   → 1.02x overcount)

Since input tokens typically represent 70-80% of the cost, users are being charged
approximately 1.7-1.8x the actual cost for Anthropic models.

Fix

Only take input_tokens from message_start and output_tokens from message_delta:
if ($type == 'message_start') {
$inputTokensCount += $data->message->usage->input_tokens ?? 0;
// Do NOT add output_tokens here - message_start only has an initial estimate
continue;
}
if ($type == 'message_delta') {
// Do NOT add input_tokens here - it's a duplicate of message_start
$outputTokensCount += $data->usage->output_tokens ?? 0;
continue;
}
This also applies to each iteration of the tool call loop (while loop), since every
API call in the loop produces its own message_start + message_delta pair.
Best regards

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions