Skip to content

Uncertainty engine: calibration roll cron (#101)#112

Merged
ssilvius merged 2 commits intofeat/uncertainty-orphan-cronfrom
feat/uncertainty-calibration-cron
Apr 28, 2026
Merged

Uncertainty engine: calibration roll cron (#101)#112
ssilvius merged 2 commits intofeat/uncertainty-orphan-cronfrom
feat/uncertainty-calibration-cron

Conversation

@ssilvius
Copy link
Copy Markdown
Contributor

Closes #101. Stacks on #111.

Pure bucketing in api/lib/uncertainty/calibration.ts:

  • bucketCohort splits witnessed predictions across 10 buckets of 0.1 width on claimed_confidence. Per non-empty bucket: mean claimed, mean correctness, count, and Brier as mean squared error. Per-cohort orphan_count is attached to every emitted snapshot row so the dashboard surfaces the orphan share alongside reliability.
  • rollCalibration walks every active cohort (predictions in witnessed | orphaned), atomically replaces prior snapshot rows for the cohort, and returns counts of cohorts and rows touched.

Exposed at POST /api/uncertainty/internal/calibration-roll, sharing the CRON_SECRET bearer guard with the orphan sweep. wrangler.jsonc adds a 03:15 UTC daily trigger alongside the hourly orphan sweep; both are invoked by an external scheduler against the internal endpoints since the Astro adapter does not expose a scheduled handler.

Tests cover empty cohorts, single-bucket and multi-bucket spreads, mean / Brier math, null-correctness rows, the confidence=1.0 edge, and the orphan-attachment invariant (10 calibration tests).

ssilvius and others added 2 commits April 27, 2026 22:27
Closes #101. Phase 1 of the uncertainty engine spike.

Pure bucketing in api/lib/uncertainty/calibration.ts:
- bucketCohort splits witnessed predictions across 10 buckets of 0.1
  width on claimed_confidence. Per non-empty bucket: mean claimed,
  mean correctness, count, and Brier as mean squared error. Per-cohort
  orphan_count is attached to every emitted snapshot row so the
  dashboard surfaces the orphan share alongside reliability.
- rollCalibration walks every active cohort (predictions in
  witnessed | orphaned), atomically replaces prior snapshot rows
  for the cohort, and returns counts of cohorts and rows touched.

Exposed at POST /api/uncertainty/internal/calibration-roll, sharing
the CRON_SECRET bearer guard with the orphan sweep. wrangler.jsonc
adds a 03:15 UTC daily trigger alongside the hourly orphan sweep;
both are invoked by an external scheduler against the internal
endpoints since the Astro adapter does not expose a scheduled handler.

Tests cover empty cohorts, single-bucket and multi-bucket spreads,
mean / Brier math, null-correctness rows, the confidence=1.0 edge,
and the orphan-attachment invariant (10 calibration tests).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@ssilvius ssilvius merged commit 868e856 into feat/uncertainty-orphan-cron Apr 28, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant