Skip to content

[BUG] [v0.0.7] cortex stats undercounts tokens — dedup fallback key uses constant total count instead of per-message index (stats_cmd.rs:497) #53387

@nightmare0329

Description

@nightmare0329

Project

Repository: CortexLM/cortex
Version: v0.0.7
Component: cortex stats — session token counting (src/cortex-cli/src/stats_cmd.rs)

Description

parse_session_file() tracks which messages have already had their tokens counted using a HashSet<String> deduplication key. For messages that have no "id" field, the fallback key is:

// src/cortex-cli/src/stats_cmd.rs:493-498
let count_key = if !msg_id.is_empty() {
    msg_id.clone()
} else {
    // Fallback: use message index position (less reliable but better than nothing)
    format!("msg_{}", data.message_count)  // BUG: data.message_count is CONSTANT
};

The comment says "use message index position" but data.message_count is set once at line 469 (data.message_count = messages.len() as u64) and never changes inside the loop. Every message without an id gets the exact same key (e.g., "msg_3" for a 3-message session).

Result: Only the first id-less message gets its tokens counted; all subsequent messages without IDs are treated as duplicates and skipped. For sessions where no messages have id fields (which occurs with some LLM providers and older session formats), cortex stats reports 0 output tokens and severely undercounted totals.

CLI Output Demonstrating the Bug

The following Rust program reproduces the exact dedup logic from parse_session_file():

=== Reproducing cortex stats token undercounting bug (stats_cmd.rs:497) ===

  Counted  [key=msg_3]  +input=500 +output=0
  SKIPPED  [key=msg_3]  input=0 output=1200 <-- treated as duplicate (BUG)
  SKIPPED  [key=msg_3]  input=0 output=800 <-- treated as duplicate (BUG)

--- cortex stats output ---
  Input Tokens:  500
  Output Tokens: 0

--- expected (correct) ---
  Input Tokens:  500
  Output Tokens: 2000

Bug: 2000 output tokens undercounted (only first message counted, rest treated as duplicates)

Steps to Reproduce

  1. Create a session file at ~/.cortex/sessions/test-session.json without id fields on messages:
{
  "model": "claude-sonnet-4",
  "timestamp": "2025-04-01T00:00:00Z",
  "messages": [
    {"role": "user",      "content": "Hello", "usage": {"input_tokens": 500, "output_tokens": 0}},
    {"role": "assistant", "content": "Hi!",   "usage": {"input_tokens": 0,   "output_tokens": 1200}},
    {"role": "assistant", "content": "Done",  "usage": {"input_tokens": 0,   "output_tokens": 800}}
  ]
}
  1. Run cortex stats
  2. Observe Output Tokens: 0 instead of 2000

Expected Behavior

All messages' token counts should be included. The fallback key should use a per-message index (e.g., a loop counter i), not the constant total message count.

Actual Behavior

All messages without id share the key "msg_N" (where N = total message count). Only the first one's tokens are counted; the rest are silently skipped as apparent duplicates.

Fix

// Replace line 497:
format!("msg_{}", data.message_count)
// With a per-message loop index, e.g.:
format!("msg_{}", loop_index)  // where loop_index increments each iteration

Environment

  • OS: Linux
  • Version: v0.0.7
  • File: src/cortex-cli/src/stats_cmd.rs, line 497

Metadata

Metadata

Assignees

No one assigned

    Labels

    invalidThis doesn't seem right

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions