Skip to content

Reference Stories

Stas edited this page Mar 7, 2026 · 2 revisions

What They Are

Reference stories are concrete, completed examples at each complexity level per task type. They serve as calibration anchors — when estimating a new task, compare it against the reference story for that size/type.

This is the single most effective calibration technique according to estimation research.

Format

coding / S:
  Title: "Add 404 error page"
  Actual rounds: 4
  Actual total: 1.2 hrs
  Notes: Single component, clear spec, no API

coding / M:
  Title: "Stripe payment integration"
  Actual rounds: 15
  Actual total: 4.5 hrs
  Notes: External API, webhook handling, error states

coding / L:
  Title: "Refactor data layer to ORM"
  Actual rounds: 38
  Actual total: 12 hrs
  Notes: 12 tables, raw SQL conversion, migration scripts

data-migration / L:
  Title: "MySQL to PostgreSQL migration"
  Actual rounds: 45
  Actual total: 3 days
  Notes: Schema translation, query rewrite, staged rollout

testing / M:
  Title: "E2E tests for checkout flow"
  Actual rounds: 18
  Actual total: 5 hrs
  Notes: Playwright setup, 8 test scenarios, CI integration

How to Use Them

When estimating a new task:

  1. Identify the size and type
  2. Find the reference story for that combination
  3. Compare: "Is this new task bigger, smaller, or similar to the reference?"
  4. Adjust accordingly

This "outside view" counters optimism bias by grounding estimates in what actually happened, not what we hope will happen.

Building Your Library

Start with 1-2 reference stories per size band. Over time, build to cover each task type:

S M L XL
coding
bug-fix
testing
infrastructure
data-migration
design

When to Update

Update a reference story when:

  • A better representative example is completed
  • The reference story's actual is more than 30% off from your team's current norm
  • Agent capabilities have improved significantly (recalibrate agent rounds)
  • Team composition has changed substantially

Clone this wiki locally