-
Notifications
You must be signed in to change notification settings - Fork 0
Reference Stories
Stas edited this page Mar 7, 2026
·
2 revisions
Reference stories are concrete, completed examples at each complexity level per task type. They serve as calibration anchors — when estimating a new task, compare it against the reference story for that size/type.
This is the single most effective calibration technique according to estimation research.
coding / S:
Title: "Add 404 error page"
Actual rounds: 4
Actual total: 1.2 hrs
Notes: Single component, clear spec, no API
coding / M:
Title: "Stripe payment integration"
Actual rounds: 15
Actual total: 4.5 hrs
Notes: External API, webhook handling, error states
coding / L:
Title: "Refactor data layer to ORM"
Actual rounds: 38
Actual total: 12 hrs
Notes: 12 tables, raw SQL conversion, migration scripts
data-migration / L:
Title: "MySQL to PostgreSQL migration"
Actual rounds: 45
Actual total: 3 days
Notes: Schema translation, query rewrite, staged rollout
testing / M:
Title: "E2E tests for checkout flow"
Actual rounds: 18
Actual total: 5 hrs
Notes: Playwright setup, 8 test scenarios, CI integration
When estimating a new task:
- Identify the size and type
- Find the reference story for that combination
- Compare: "Is this new task bigger, smaller, or similar to the reference?"
- Adjust accordingly
This "outside view" counters optimism bias by grounding estimates in what actually happened, not what we hope will happen.
Start with 1-2 reference stories per size band. Over time, build to cover each task type:
| S | M | L | XL | |
|---|---|---|---|---|
| coding | ○ | ○ | ○ | ○ |
| bug-fix | ○ | ○ | ||
| testing | ○ | ○ | ||
| infrastructure | ○ | ○ | ||
| data-migration | ○ | ○ | ||
| design | ○ | ○ |
Update a reference story when:
- A better representative example is completed
- The reference story's actual is more than 30% off from your team's current norm
- Agent capabilities have improved significantly (recalibrate agent rounds)
- Team composition has changed substantially
Getting Started
Core Concepts
- How It Works
- Task Types
- Agent Effectiveness
- Confidence Levels
- Cone of Uncertainty
- PERT Statistics
- Small Council
Reference
Accuracy
Contributors