Project
Repository: CortexLM/cortex
Version: v0.0.7
Component: cortex stats — session token counting (src/cortex-cli/src/stats_cmd.rs)
Description
parse_session_file() tracks which messages have already had their tokens counted using a HashSet<String> deduplication key. For messages that have no "id" field, the fallback key is:
// src/cortex-cli/src/stats_cmd.rs:493-498
let count_key = if !msg_id.is_empty() {
msg_id.clone()
} else {
// Fallback: use message index position (less reliable but better than nothing)
format!("msg_{}", data.message_count) // BUG: data.message_count is CONSTANT
};
The comment says "use message index position" but data.message_count is set once at line 469 (data.message_count = messages.len() as u64) and never changes inside the loop. Every message without an id gets the exact same key (e.g., "msg_3" for a 3-message session).
Result: Only the first id-less message gets its tokens counted; all subsequent messages without IDs are treated as duplicates and skipped. For sessions where no messages have id fields (which occurs with some LLM providers and older session formats), cortex stats reports 0 output tokens and severely undercounted totals.
CLI Output Demonstrating the Bug
The following Rust program reproduces the exact dedup logic from parse_session_file():
=== Reproducing cortex stats token undercounting bug (stats_cmd.rs:497) ===
Counted [key=msg_3] +input=500 +output=0
SKIPPED [key=msg_3] input=0 output=1200 <-- treated as duplicate (BUG)
SKIPPED [key=msg_3] input=0 output=800 <-- treated as duplicate (BUG)
--- cortex stats output ---
Input Tokens: 500
Output Tokens: 0
--- expected (correct) ---
Input Tokens: 500
Output Tokens: 2000
Bug: 2000 output tokens undercounted (only first message counted, rest treated as duplicates)
Steps to Reproduce
- Create a session file at
~/.cortex/sessions/test-session.json without id fields on messages:
{
"model": "claude-sonnet-4",
"timestamp": "2025-04-01T00:00:00Z",
"messages": [
{"role": "user", "content": "Hello", "usage": {"input_tokens": 500, "output_tokens": 0}},
{"role": "assistant", "content": "Hi!", "usage": {"input_tokens": 0, "output_tokens": 1200}},
{"role": "assistant", "content": "Done", "usage": {"input_tokens": 0, "output_tokens": 800}}
]
}
- Run
cortex stats
- Observe
Output Tokens: 0 instead of 2000
Expected Behavior
All messages' token counts should be included. The fallback key should use a per-message index (e.g., a loop counter i), not the constant total message count.
Actual Behavior
All messages without id share the key "msg_N" (where N = total message count). Only the first one's tokens are counted; the rest are silently skipped as apparent duplicates.
Fix
// Replace line 497:
format!("msg_{}", data.message_count)
// With a per-message loop index, e.g.:
format!("msg_{}", loop_index) // where loop_index increments each iteration
Environment
- OS: Linux
- Version: v0.0.7
- File:
src/cortex-cli/src/stats_cmd.rs, line 497
Project
Repository: CortexLM/cortex
Version: v0.0.7
Component:
cortex stats— session token counting (src/cortex-cli/src/stats_cmd.rs)Description
parse_session_file()tracks which messages have already had their tokens counted using aHashSet<String>deduplication key. For messages that have no"id"field, the fallback key is:The comment says "use message index position" but
data.message_countis set once at line 469 (data.message_count = messages.len() as u64) and never changes inside the loop. Every message without anidgets the exact same key (e.g.,"msg_3"for a 3-message session).Result: Only the first id-less message gets its tokens counted; all subsequent messages without IDs are treated as duplicates and skipped. For sessions where no messages have
idfields (which occurs with some LLM providers and older session formats),cortex statsreports 0 output tokens and severely undercounted totals.CLI Output Demonstrating the Bug
The following Rust program reproduces the exact dedup logic from
parse_session_file():Steps to Reproduce
~/.cortex/sessions/test-session.jsonwithoutidfields on messages:{ "model": "claude-sonnet-4", "timestamp": "2025-04-01T00:00:00Z", "messages": [ {"role": "user", "content": "Hello", "usage": {"input_tokens": 500, "output_tokens": 0}}, {"role": "assistant", "content": "Hi!", "usage": {"input_tokens": 0, "output_tokens": 1200}}, {"role": "assistant", "content": "Done", "usage": {"input_tokens": 0, "output_tokens": 800}} ] }cortex statsOutput Tokens: 0instead of2000Expected Behavior
All messages' token counts should be included. The fallback key should use a per-message index (e.g., a loop counter
i), not the constant total message count.Actual Behavior
All messages without
idshare the key"msg_N"(where N = total message count). Only the first one's tokens are counted; the rest are silently skipped as apparent duplicates.Fix
Environment
src/cortex-cli/src/stats_cmd.rs, line 497