Releases · Enreign/progressive-estimation

12 Mar 11:35

Enreign

v0.5.0

3ac1f52

v0.5.0 — Deep Validation Calibration Latest

Latest

What's New

Deep Statistical Validation

All formula parameters validated against 110k+ data points across 13 datasets using bootstrap confidence intervals and KS goodness-of-fit tests. 53 parameters audited across 11 analyses — 22 adjusted, 7 flagged, 24 confirmed.

Full findings: docs/deep-validation-findings.md

5 Research-Backed Parameter Changes

1. Confidence multipliers → size-dependent
Old: flat 1.0x / 1.4x / 1.8x for all sizes.
New: size-dependent (e.g., 90% S=2.9x, M=2.1x, L=2.0x, XL=2.2x).
Evidence: 84k estimate-actual pairs. The old 1.8x captured only 80-88% of tasks, not 90%.

2. Agent effectiveness: M and L lowered
Old: M=0.7, L=0.5. New: M=0.5, L=0.35.
Evidence: METR Time Horizons — 24k runs, 228 tasks (Jan 2025–Feb 2026). Autonomous completion: M=0.25, L=0.15. Our values sit between METR rates and 1.0 (partial-credit acceleration).

3. PERT → log-normal weighting
Old: (min + 4×midpoint + max) / 6 (beta assumption).
New: (min + 4×geometric_mean + max) / 6 (log-normal).
Evidence: KS goodness-of-fit test (n=84k) — log-normal beats beta in all 4 size bands. Software effort has heavier right tails than beta captures.

4. Review minutes halved
Old: S=30, M=60, L=120, XL=240 (standard depth).
New: S=20, M=30, L=60, XL=120.
Evidence: Project-22 reviews (n=290 joined) — 17-20 min median regardless of size. Conservative reduction given this is human-code review, not AI-generated code review.

5. Output token ratio: S/M raised
Old: S=0.25, M=0.28. New: S=0.30, M=0.30.
Evidence: Aider leaderboard (n=23 models) — 0.31 median output ratio (excluding reasoning tokens).

New Validation Infrastructure

tests/deep_validation.py — 11-analysis script producing Parameter Audit Cards with bootstrap CIs
datasets/download_benchmarks.sh — downloads METR, OpenHands, Aider benchmark data
Bundled datasets: tokenomics.csv (arXiv 2601.14470), onprem-tokens.csv
19 new tests (63 total) encoding research-backed properties

Data Sources

Dataset	Size	What It Validates
CESAW	61k tasks	Distribution fitting, confidence multipliers, base rounds
SiP	12k tasks	Task type multipliers, confidence multipliers
Project-22	616 stories + 1441 reviews	Review times, human fix ratio
Renzo Pomodoro	10k tasks	Distribution fitting, confidence multipliers
METR Time Horizons	24k runs / 228 tasks	Agent effectiveness by size
Aider Leaderboard	~50 models / 93 entries	Output token ratio, cost model
Tokenomics (ChatDev)	30 tasks / 6 phases	Per-phase token distribution

Version

0.4.0 → 0.5.0

Files Changed

File	Changes
`references/formulas.md`	5 parameter updates, log-normal PERT, size-dependent multipliers
`tests/test_formulas.py`	Updated constants + 19 new validation tests (63 total)
`tests/deep_validation.py`	New — 11-analysis deep validation script
`datasets/download_benchmarks.sh`	New — benchmark data download script
`datasets/benchmarks/*.csv`	New — bundled token datasets
`docs/deep-validation-findings.md`	New — combined findings document
`datasets/README.md`	Added benchmark download instructions
`.gitignore`	Added benchmark data exclusions
`SKILL.md`	Version bump, updated descriptions
`README.md`	Updated parameters, research sources, file structure
`evals/eval-regression.md`	Updated baselines for new values

Assets 3

11 Mar 19:41

Enreign

v0.4.0

1a0edba

v0.4.0 — Token Estimation, 4 New PM Tools

What's New

Token & Cost Estimation (Step 15)

Computes per-task token consumption based on complexity × maturity lookup tables
Splits into input/output tokens using complexity-specific output ratios
PERT expected tokens with min/max range
Optional API cost estimation across three model tiers:
- Economy — Haiku, GPT-4o Mini, Gemini Flash
- Standard — Sonnet, GPT-4o, Gemini 2.5 Pro
- Premium — Opus, GPT-5
Includes per-model pricing reference table (last verified March 2026)
Tokens appear in one-line summary: 10-26 agent rounds (~180k tokens)
Token Estimate and Model Tier rows added to breakdown table
Cost only shown when show_cost=true (off by default)

4 New PM Tool Integrations

Asana — custom fields for sizing, risk, agent rounds, tokens; subtasks native
Azure DevOps — tags for size/risk, native Original Estimate, HTML table in Description
Zenhub — GitHub labels + body sections, Zenhub Estimate for points
Shortcut — labels for size/risk, custom fields on Team plan+, native Estimate for points

Total supported trackers: 10 (was 6)

New Inputs

model_tier — economy/standard/premium (default: standard)
show_cost — boolean (default: false)
Question #14 added to detailed questionnaire path

Test Coverage

estimate_tokens() function with full Step 15 math
Token estimation integrated into main estimate() pipeline
TestCase7 — exact token math, output ratios, cost arithmetic
TestCase8 — complexity scaling, multi-agent multiplier, maturity variation, PERT cost formula, tier pricing comparison
TestCase9 — integration: token_estimate present in estimate() output, correct structure, rounds consistency, show_cost toggle
44 tests total (was 25), all passing
Original 6 test cases unchanged (backward compatible)

Version

0.3.0 → 0.4.0

Files Changed

File	Changes
`references/formulas.md`	Token lookup tables, Step 15 formula, output schema fields
`tests/test_formulas.py`	`estimate_tokens()`, TestCase7-9, integration into `estimate()`
`references/questionnaire.md`	Question #14, mapping table update
`references/output-schema.md`	Token rows in templates, 4 new PM tool sections, batch Tokens column
`SKILL.md`	Tracker list, pipeline step, version bump
`README.md`	Feature list, tracker list

Assets 3

08 Mar 23:01

Enreign

v0.3.0

2ee310b

v0.3.0 — Instant Mode, Cooperation Modes, Research Grounding

What's New

Three Estimation Modes

Instant — zero questions, infer everything from the task description
Quick — 4 questions with sensible defaults
Detailed — 13 questions, full control over every parameter

Cooperation Mode Detection

Auto-detects your team's working style from intake answers:

Human-only → story points for sizing and velocity
Hybrid → dual-track: points for sizing, hours for planning
Agent-first → plan by human review capacity in hours

Small Council Validation

Subagent perspectives (Optimist, Skeptic, Historian) review estimates for M+ tasks before output.

Sprint Velocity Tracking

Mode-aware velocity tracking with rolling averages, capacity planning, and divergence detection for hybrid teams.

Research Grounding

Formulas now cite 15+ empirical studies:

METR Time Horizons (R²=0.83, ~230 tasks) — agent effectiveness decay
METR RCT (16 devs, 246 tasks) — productivity measurement
Faros AI (10k devs, 1.2k teams) — review bottleneck validation
PairCoder (400+ tasks) — human time dominance confirmation
Agent effectiveness clarified as work acceleration, not autonomous completion rate
Bug-fix multiplier bumped 1.2x → 1.3x based on empirical data

GitHub Action

Auto-estimate issues labeled needs-estimate with PERT formulas.

AskUserQuestion Integration

Structured dropdowns for all user interactions when the tool is available.

Full Changelog

v0.1.1...v0.3.0

Assets 3

07 Mar 20:28

Enreign

v0.2.0

07d7bf8

v0.2.0 — Multi-skill repo structure

Agent Skills v0.2.0

Breaking changes

Repo renamed from progressive-estimation to agent-skills (GitHub redirects old URLs)
Skill files moved to skills/progressive-estimation/ for multi-skill support

What's new

Multi-skill repo structure — ready for additional skills under skills/
New README — landing page listing all available skills

Updated install commands:

# Via skills.sh
npx skills add Enreign/agent-skills

# Manual (Claude Code)
git clone https://github.com/Enreign/agent-skills.git ~/.claude/skills/agent-skills

Skills included

progressive-estimation — PERT estimation for AI-assisted development work

⚠️ Early development — formulas, multipliers, and defaults may change between versions.

Assets 3

07 Mar 20:17

Enreign

v0.1.1

39ce08f

v0.1.1 — skills.sh compatibility + structured intake

Progressive Estimation v0.1.1

What's new

skills.sh compatibility — install with npx skills add Enreign/progressive-estimation
YAML frontmatter in SKILL.md for automatic skill discovery
Structured intake — uses AskUserQuestion tool when available for dropdown-style questionnaire instead of free-text back-and-forth
Updated banner — new Estimation artwork

Install

# Via skills.sh (recommended)
npx skills add Enreign/progressive-estimation

# Manual (Claude Code)
git clone https://github.com/Enreign/progressive-estimation.git ~/.claude/skills/progressive-estimation

Or download the .skill archive from this release.

⚠️ Early development — formulas, multipliers, and defaults may change between versions.

Assets 3

07 Mar 19:49

Enreign

Immutable

v0.1.0

f1cfed9

v0.1.0 — Initial Public Release

Progressive Estimation v0.1.0

First public release of the Progressive Estimation skill for AI coding assistants.

What's included

PERT three-point estimation with confidence bands (68%, 95%)
Four workflow paths: quick/detailed × single/batch
Agent effectiveness decay modeling based on METR research
Calibration system with PRED(25) tracking
Tracker output for Linear, JIRA, ClickUp, GitHub Issues, Monday, and GitLab
Anti-pattern guards for common estimation mistakes
Eval suite for regression, batch, quick, and hybrid scenarios

Install

git clone https://github.com/Enreign/progressive-estimation.git ~/.claude/skills/progressive-estimation

Or download the .skill archive from this release.

⚠️ Early development — formulas, multipliers, and defaults may change between versions.

Assets 4

Releases: Enreign/progressive-estimation

v0.5.0 — Deep Validation Calibration

What's New

Deep Statistical Validation

5 Research-Backed Parameter Changes

New Validation Infrastructure

Data Sources

Version

Files Changed

Uh oh!

v0.4.0 — Token Estimation, 4 New PM Tools

What's New

Token & Cost Estimation (Step 15)

4 New PM Tool Integrations

New Inputs

Test Coverage

Version

Files Changed

Uh oh!

v0.3.0 — Instant Mode, Cooperation Modes, Research Grounding

What's New

Three Estimation Modes

Cooperation Mode Detection

Small Council Validation

Sprint Velocity Tracking

Research Grounding

GitHub Action

AskUserQuestion Integration

Full Changelog

Uh oh!

v0.2.0 — Multi-skill repo structure

Agent Skills v0.2.0

Breaking changes

What's new

Skills included

Uh oh!

v0.1.1 — skills.sh compatibility + structured intake

Progressive Estimation v0.1.1

What's new

Install

Uh oh!

v0.1.0 — Initial Public Release

Progressive Estimation v0.1.0

What's included

Install

Uh oh!