fix(ccusage): improve deduplication to keep records with most tokens #826

jaried · 2026-01-27T06:40:44Z

Summary

This PR improves the deduplication logic in ccusage to handle edge cases where API proxies don't include requestId field, and ensures that records with the highest token count are preserved during deduplication.

Changes

1. Modified `createUniqueHash`

When requestId is missing, fall back to using only messageId for deduplication
Previously returned null when requestId was missing, causing records to bypass deduplication

2. Added `getTotalTokensFromUsageData`

New function to calculate total tokens using complete formula
Formula: input_tokens + output_tokens + cache_creation_input_tokens + cache_read_input_tokens

3. Updated deduplication logic

Changed from Set<string> to Map<string, {index, tokens}> for tracking processed entries
When duplicate is found, compare token counts and keep the record with more tokens
This prevents data loss from streaming intermediate states (which often have 0 or very few tokens)

4. Updated functions

loadDailyUsageData
loadSessionData
loadSessionBlockData

Testing

Added unit tests for createUniqueHash (including fallback behavior)
Added unit tests for getTotalTokensFromUsageData
Added unit tests for shouldReplaceExisting
Updated existing deduplication tests to reflect new behavior

Summary by CodeRabbit

Bug Fixes
- Improved duplicate entry handling by retaining entries with higher token counts for more accurate data reporting.
- Enhanced resilience when processing records with incomplete or missing identifiers.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

- Modified createUniqueHash to fall back to messageId when requestId is missing - Added getTotalTokensFromUsageData function for complete token calculation - Changed deduplication logic to keep records with highest token count - Updated loadDailyUsageData, loadSessionData, and loadSessionBlockData - Added comprehensive unit tests for new deduplication behavior Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

coderabbitai · 2026-01-27T06:41:04Z

📝 Walkthrough

Walkthrough

The data-loader.ts module's deduplication logic is refactored to prioritize entries with higher token counts instead of simple duplicate skipping. A new exported utility function getTotalTokensFromUsageData computes total tokens from usage entries. The deduplication map switches from Set-based tracking to Map-based with token storage for comparison.

Changes

Cohort / File(s)	Summary
Deduplication Logic Overhaul `apps/ccusage/src/data-loader.ts`	Replaced per-entry duplicate skipping with token-count–based replacement policy. Updated `createUniqueHash` to handle missing `messageId`/`requestId` with fallback logic. Added `getTotalTokensFromUsageData` utility to sum input, output, cache tokens. Converted deduplication tracking from Set to Map with `{ index, tokens }` structure. Modified daily/session/session-block loaders to replace existing entries when new duplicates have higher token counts. Updated tests to expect higher-token entries in dedup scenarios.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

PR #58: Modifies src/data-loader.ts deduplication logic including createUniqueHash and load functions; this PR evolves the Set-based chronological deduplication into token-count–based replacement policy.
PR #65: Introduces the isDuplicateEntry/markAsProcessed deduplication helpers that this PR replaces with token-count–based replacement policy and new data structures.

Suggested reviewers

ryoppippi
skylinesales

Poem

🐰✨ Tokens counted, not just skipped away,
Duplicates compared by their token display—
Keep the richer entry, let the weaker fade,
A smarter dedup dance our loader has made! 🎭

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately and concisely describes the main change: improving deduplication logic to retain records with the highest token counts.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@apps/ccusage/src/data-loader.ts`:
- Around line 529-543: The function getTotalTokensFromUsageData is exported but
not used externally; remove the export to make it module-private by changing its
declaration to a non-exported function (keep the name
getTotalTokensFromUsageData and signature using UsageData so callers inside this
file still work), then run a quick grep across the repo to confirm no external
imports reference it and update any internal callers if needed.

🧹 Nitpick comments (1)

apps/ccusage/src/data-loader.ts (1)

490-496: Avoid a test-only helper drifting from production logic.
shouldReplaceExisting is only exercised in the test block, while the dedup paths inline the comparison. Consider using this helper in those paths (or removing it) to keep a single source of truth.

coderabbitai · 2026-01-27T06:47:02Z

apps/ccusage/src/data-loader.ts

+/**
+ * Calculate total tokens from UsageData for deduplication comparison
+ * Uses complete formula: input + output + cache_creation + cache_read
+ * @param data - UsageData object
+ * @returns Total number of tokens
+ */
+export function getTotalTokensFromUsageData(data: UsageData): number {
+	const usage = data.message.usage;
+	return (
+		usage.input_tokens +
+		usage.output_tokens +
+		(usage.cache_creation_input_tokens ?? 0) +
+		(usage.cache_read_input_tokens ?? 0)
+	);
+}


⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

rg -n "getTotalTokensFromUsageData" -g "!apps/ccusage/src/data-loader.ts"

Repository: ryoppippi/ccusage

Length of output: 43

Remove the export keyword from getTotalTokensFromUsageData. This function is not used outside this module and violates the guideline to export only symbols actually used by other modules.

🤖 Prompt for AI Agents

In `@apps/ccusage/src/data-loader.ts` around lines 529 - 543, The function getTotalTokensFromUsageData is exported but not used externally; remove the export to make it module-private by changing its declaration to a non-exported function (keep the name getTotalTokensFromUsageData and signature using UsageData so callers inside this file still work), then run a quick grep across the repo to confirm no external imports reference it and update any internal callers if needed.

coderabbitai bot reviewed Jan 27, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(ccusage): improve deduplication to keep records with most tokens #826

fix(ccusage): improve deduplication to keep records with most tokens #826

Uh oh!

jaried commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 27, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

fix(ccusage): improve deduplication to keep records with most tokens #826

Are you sure you want to change the base?

fix(ccusage): improve deduplication to keep records with most tokens #826

Uh oh!

Conversation

jaried commented Jan 27, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

1. Modified createUniqueHash

2. Added getTotalTokensFromUsageData

3. Updated deduplication logic

4. Updated functions

Testing

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Jan 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jaried commented Jan 27, 2026 •

edited by coderabbitai bot

Loading

1. Modified `createUniqueHash`

2. Added `getTotalTokensFromUsageData`

coderabbitai bot commented Jan 27, 2026 •

edited

Loading