Skip to content

Add cost and token tracking per benchmark run #2

@rajkumar42

Description

@rajkumar42

Summary

Add instrumentation to track token usage (input/output) and estimated cost per benchmark task and per full run. This enables cost comparison across providers and helps users estimate expenses before running large benchmarks.

What needs to happen

  • Capture input and output token counts from each LLM call
  • Map token counts to estimated cost using per-provider pricing
  • Aggregate totals per task and per full benchmark run
  • Output a cost summary at the end of each run (total tokens, total cost, avg cost/task)
  • Save cost data alongside benchmark results for later analysis

Example output

Benchmark complete: 162 tasks
Total tokens: 1,245,000 (input: 980,000 / output: 265,000)
Estimated cost: $4.82
Avg cost/task: $0.03
Provider: claude-sonnet-4-6

Acceptance criteria

  • Token counts are captured per LLM call
  • Cost summary is printed at end of run
  • Cost data is saved to results file

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions