-
Notifications
You must be signed in to change notification settings - Fork 0
Add cost and token tracking per benchmark run #2
Copy link
Copy link
Open
Labels
good first issueGood for newcomersGood for newcomers
Description
Summary
Add instrumentation to track token usage (input/output) and estimated cost per benchmark task and per full run. This enables cost comparison across providers and helps users estimate expenses before running large benchmarks.
What needs to happen
- Capture input and output token counts from each LLM call
- Map token counts to estimated cost using per-provider pricing
- Aggregate totals per task and per full benchmark run
- Output a cost summary at the end of each run (total tokens, total cost, avg cost/task)
- Save cost data alongside benchmark results for later analysis
Example output
Benchmark complete: 162 tasks
Total tokens: 1,245,000 (input: 980,000 / output: 265,000)
Estimated cost: $4.82
Avg cost/task: $0.03
Provider: claude-sonnet-4-6
Acceptance criteria
- Token counts are captured per LLM call
- Cost summary is printed at end of run
- Cost data is saved to results file
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers