feat: add confidence calibration dataset framework #50

dtsong · 2026-01-24T20:11:59Z

Summary

CalibrationWeights dataclass for tunable severity penalties in confidence scoring
calculate_confidence now accepts optional weights parameter for calibrated scoring
evals/calibration.py with bucketing, ECE (Expected Calibration Error), and weight suggestion
load_calibration_data reads human-labeled outcomes from YAML files
Sample calibration data with 5 labeled review outcomes

Test plan

16 new tests covering calibration samples, bucketing, analysis, weight suggestions, and data loading
Existing confidence tests still pass with new CalibrationWeights defaults
Full suite passes (405 tests, 95.77% coverage)

Closes #39

🤖 Generated with Claude Code

- CalibrationWeights dataclass for tunable severity penalties - calculate_confidence accepts optional weights parameter - Calibration analysis: bucketing, ECE calculation, weight suggestions - load_calibration_data reads labeled outcomes from YAML - Sample calibration data with 5 labeled outcomes - 16 tests covering calibration analysis and weight tuning Implements #39 Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

dtsong merged commit 171e58b into main Jan 24, 2026
2 checks passed

dtsong deleted the feat/39-confidence-calibration branch January 24, 2026 20:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add confidence calibration dataset framework #50

feat: add confidence calibration dataset framework #50

Uh oh!

dtsong commented Jan 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add confidence calibration dataset framework #50

feat: add confidence calibration dataset framework #50

Uh oh!

Conversation

dtsong commented Jan 24, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants