Evals

The skill includes evaluation prompts per the Claude Skills 2.0 framework.

Why Evals Matter

Evals detect when formula changes or model updates cause estimate drift. They are descriptions of desired behaviors, not just test cases.

Available Evals

eval-quick.md

Tests: Quick path produces valid output with minimal input.

Does not ask more than 4 questions
Auto-assigns task type
Produces one-line summary first
Includes PERT expected and committed values
Shows confidence bands

eval-hybrid.md

Tests: Detailed path with multi-team, confidence levels, org overhead.

All 13 questions handled correctly
Multi-human and multi-agent scaling applied
Org overhead on human time only
Cone of uncertainty spread applied
Correct confidence multiplier (90% = 1.8×)

eval-batch.md

Tests: Batch mode with mixed types and dependencies.

Processes 7 tasks in batch
Auto-assigns task types (infrastructure, coding)
Respects dependencies
Identifies critical path
Summary table + rollup + warnings

eval-regression.md

Tests: 6 known-good baselines for drift detection.

Trivial S task
Medium coding task
Large data-migration task
XL decomposition warning
Batch consistency
Confidence level comparison (50% vs 90%)

When to Run

Run evals after any change to:

formulas.md (lookup tables, multipliers)
frameworks.md (round ranges, effectiveness values)
SKILL.md (workflow changes)
Model version updates

How to Run

Paste the eval prompt into your AI coding client and compare the output against the expected behaviors listed in the eval file.

A regression is any output that:

Falls outside the expected ranges by >50%
Missing required output fields
Changes complexity assignment from baseline
Fails to trigger expected warnings

Full eval files: evals/

Getting Started

Core Concepts

Reference

Accuracy

Contributors

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evals

Why Evals Matter

Available Evals

eval-quick.md

eval-hybrid.md

eval-batch.md

eval-regression.md

When to Run

How to Run

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally