Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 5 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,8 @@ Research-backed formulas. PERT statistics. Calibration feedback loops. Zero depe
- Supports **single tasks or batches** (paste 5 issues or 500)
- Produces **PERT expected values** with confidence bands, not just ranges
- Separates **"expected"** from **"committed"** estimates at your chosen confidence level
- Outputs in formats ready for **Linear, JIRA, ClickUp, GitHub Issues, Monday, and GitLab**
- Estimates **token consumption and API cost** per model tier (economy/standard/premium)
- Outputs in formats ready for **Linear, JIRA, ClickUp, GitHub Issues, Monday, GitLab, Asana, Azure DevOps, Zenhub, and Shortcut**
- Includes a **calibration system** to improve accuracy over time with actuals

## Quick Start
Expand Down Expand Up @@ -247,7 +248,7 @@ Estimates can be output in two modes for any supported tracker:
| **Embedded** (default) | Markdown table in description/body | None |
| **Native** | Maps to tracker-specific fields | Custom fields |

**Supported:** Linear, JIRA, ClickUp, GitHub Issues, Monday, GitLab
**Supported:** Linear, JIRA, ClickUp, GitHub Issues, Monday, GitLab, Asana, Azure DevOps, Zenhub, Shortcut

Embedded mode works everywhere immediately. Native mode requires custom fields for agent-specific metrics.

Expand Down Expand Up @@ -343,7 +344,7 @@ Evaluation prompts per the [Claude Skills 2.0](https://claude.com/blog/improving
| `eval-quick.md` | Quick path produces valid PERT output with minimal input |
| `eval-hybrid.md` | Detailed path handles multi-team, confidence levels, org overhead |
| `eval-batch.md` | Batch mode with mixed types, dependencies, and rollup |
| `eval-regression.md` | 6 baseline cases to detect drift after formula changes |
| `eval-regression.md` | 8 baseline cases to detect drift after formula changes |

Run evals after any change to formulas, frameworks, or the skill workflow.

Expand All @@ -354,7 +355,7 @@ Run evals after any change to formulas, frameworks, or the skill workflow.
Contributions welcome — see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines. Key areas:

- **Calibration data** — Share anonymized estimated vs. actual results to improve default ratios
- **Tracker mappings** — Additional tracker support (Asana, Notion, Shortcut, etc.)
- **Tracker mappings** — Additional tracker support (Notion, Basecamp, etc.)
- **Task types** — New multipliers for work categories not yet covered
- **Formulas** — Improvements backed by data or research
- **Evals** — Additional test cases, especially edge cases
Expand Down
7 changes: 4 additions & 3 deletions SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ description: "Adapts to your team's working mode — human-only, hybrid, or agen
license: MIT
metadata:
author: Enreign
version: "0.3.0"
version: "0.4.0"
---

# Progressive Estimation
Expand Down Expand Up @@ -131,7 +131,8 @@ The computation pipeline:
6. Apply cone of uncertainty spread to widen/narrow range
7. Compute PERT expected value and standard deviation
8. Apply confidence multiplier for committed estimate
9. Check anti-pattern guards and generate warnings
9. Compute token & cost estimates (Step 15)
10. Check anti-pattern guards and generate warnings

If the user requests a standalone deterministic calculator, generate one from
`formulas.md` in their preferred language. The generated script must:
Expand Down Expand Up @@ -183,7 +184,7 @@ Then provide:
- Tracker-formatted output (if requested)

Ask which tracker and mode:
- **Tracker**: Linear, JIRA, ClickUp, GitHub Issues, Monday, GitLab, or generic
- **Tracker**: Linear, JIRA, ClickUp, GitHub Issues, Monday, GitLab, Asana, Azure DevOps, Zenhub, Shortcut, or generic
- **Mode**: Native fields or embedded in description (default: embedded)

For batch output, produce a summary table first, then rollup, then warnings,
Expand Down
87 changes: 85 additions & 2 deletions references/formulas.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ standalone calculator scripts in any language.
| confidence_level | 50/80/90 | 80 | — |
| definition_phase | concept/requirements/design/ready | ready | — |
| org_size | solo-startup/growth/enterprise | solo-startup | — |
| model_tier | economy/standard/premium or specific model | standard | — |
| show_cost | boolean | false | — |

## Lookup Tables

Expand Down Expand Up @@ -164,6 +166,52 @@ enterprise: 1.3 (formal review, compliance, multi-team coordination)

Applied to human time only (planning, review, fix), not agent time.

### Tokens Per Round (thousands, by complexity × maturity)

```
S M L XL
exploratory: 8k 15k 25k 40k
partial: 6k 12k 20k 35k
mostly-automated: 5k 10k 18k 30k
```

### Output Token Ratio (by complexity)

```
S: 0.25 M: 0.28 L: 0.30 XL: 0.35
```

### Model Pricing (per 1M tokens, USD — last verified March 2026)

Representative models so users can pick the closest match:

```
Model Input Output Tier
─────────────────────────────────────────────────────
GPT-4o Mini $0.15 $0.60 economy
Gemini 2.5 Flash $0.30 $2.50 economy
Claude Haiku 4.5 $1.00 $5.00 economy
Gemini 2.5 Pro $1.25 $10.00 standard
GPT-4o $2.50 $10.00 standard
Claude Sonnet 4.6 $3.00 $15.00 standard
Claude Opus 4.6 $5.00 $25.00 premium
GPT-5 $1.25 $10.00 premium (capability, not price)
```

For the tier-based formula, use these representative rates:

```
Input Output
economy: $0.50 $2.50 (Haiku, GPT-4o-mini, Gemini Flash)
standard: $2.50 $12.00 (Sonnet, GPT-4o, Gemini 2.5 Pro)
premium: $5.00 $25.00 (Opus, GPT-5)
```

Note: "Premium" reflects capability tier (best available models), not
necessarily highest price. GPT-5 is premium-capability at standard pricing.
Pricing changes frequently — check provider pages before committing to
cost-based decisions.

## Formulas

### Step 1: Agent Rounds
Expand Down Expand Up @@ -300,6 +348,29 @@ communication_overhead = 0.15 × (num_humans - 1)
adjusted_human_time = adjusted_human_time × (1 + communication_overhead)
```

### Step 15: Token & Cost Estimation

```
tokens_per_round = tokens_per_round_table[complexity][maturity]
output_ratio = output_token_ratio[complexity]

total_tokens_min = adjusted_rounds_min × tokens_per_round × num_agents
total_tokens_max = adjusted_rounds_max × tokens_per_round × num_agents

input_tokens_min = total_tokens_min × (1 - output_ratio)
input_tokens_max = total_tokens_max × (1 - output_ratio)
output_tokens_min = total_tokens_min × output_ratio
output_tokens_max = total_tokens_max × output_ratio

token_midpoint = (total_tokens_min + total_tokens_max) / 2
pert_expected_tokens = (total_tokens_min + 4 × token_midpoint + total_tokens_max) / 6

# Cost (only if show_cost == true)
cost_min = (input_tokens_min × input_price + output_tokens_min × output_price) / 1_000_000
cost_max = (input_tokens_max × input_price + output_tokens_max × output_price) / 1_000_000
pert_expected_cost = (cost_min + 4 × (cost_min + cost_max) / 2 + cost_max) / 6
```

## Anti-Pattern Guards

After computing estimates, check for these patterns and append warnings:
Expand Down Expand Up @@ -427,7 +498,16 @@ Every estimation must produce these canonical fields:
"humans": int,
"agents": int
},
"story_points": int | null
"story_points": int | null,
"token_estimate": {
"total_tokens": { "min": int, "max": int },
"input_tokens": { "min": int, "max": int },
"output_tokens": { "min": int, "max": int },
"pert_expected_tokens": int,
"model_tier": "economy" | "standard" | "premium",
"cost_usd": { "min": float, "max": float } | null,
"pert_expected_cost_usd": float | null
}
}
```

Expand All @@ -444,7 +524,10 @@ For batch, wrap in:
"critical_path": string[],
"task_count": int,
"size_distribution": { "S": int, "M": int, "L": int, "XL": int },
"warnings": string[]
"warnings": string[],
"total_tokens": int,
"pert_expected_tokens": int,
"total_cost_usd": float | null
}
}
```
Expand Down
107 changes: 99 additions & 8 deletions references/output-schema.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ Output format adapts to the detected cooperation mode:

Single task:
```
Expected: ~4 hrs | Committed (80%): ~5.5 hrs | 10-26 agent rounds + 3 hrs human | Risk: medium | Size: M
Expected: ~4 hrs | Committed (80%): ~5.5 hrs | 10-26 agent rounds (~180k tokens) + 3 hrs human | Risk: medium | Size: M
```

Batch:
Expand Down Expand Up @@ -90,6 +90,7 @@ Ask the user: "Native fields or embedded in description? (default: embedded)"
| committed_hours | Custom field | "Committed Estimate (hrs)" |
| confidence_level | Custom field | "Confidence %" |
| priority | Priority | 1-4 mapping |
| token_estimate | Custom field | "Est. Tokens" |

**Embedded:**
```markdown
Expand All @@ -107,10 +108,17 @@ Ask the user: "Native fields or embedded in description? (default: embedded)"
| **Expected (PERT)** | **~4 hrs** |
| **Committed (80%)** | **~5.5 hrs** |
| Confidence Band (68%) | 3.4-5.0 hrs |
| Token Estimate | ~180k tokens |
| Model Tier | standard |
| Est. Cost | ~$1.20 |
| Risk | medium |
| Team | 1 human, 1 agent |
```

Token Estimate and Model Tier always appear in the breakdown table.
Est. Cost only appears if `show_cost == true`.
Cost does NOT appear in the one-line summary (too noisy).

### Canonical → JIRA

**Native:**
Expand All @@ -126,6 +134,7 @@ Ask the user: "Native fields or embedded in description? (default: embedded)"
| human_review_minutes | Custom field | number type |
| pert_expected_hours | Custom field | "Expected Estimate (hrs)" |
| labels | Labels | array |
| token_estimate | Custom field | "Est. Tokens" (number) |

**Embedded:** Same markdown table in Description field.

Expand All @@ -143,6 +152,7 @@ Ask the user: "Native fields or embedded in description? (default: embedded)"
| agent_rounds | Custom field | number |
| human_review_minutes | Custom field | number |
| priority | Priority | 1-4 |
| token_estimate | Custom field | "Est. Tokens" (number) |

**Embedded:** Same markdown table in Description field.

Expand All @@ -160,6 +170,7 @@ Ask the user: "Native fields or embedded in description? (default: embedded)"
| agent_rounds | Body section | no custom fields |
| human_review_minutes | Body section | no custom fields |
| labels | Labels | — |
| token_estimate | Body section | no custom fields |

**Embedded:** Markdown table in issue Body. This is the recommended mode
for GitHub Issues since it has no custom field support.
Expand All @@ -180,6 +191,7 @@ for GitHub Issues since it has no custom field support.
| human_review_minutes | Numbers column | "Review (min)" |
| priority | Priority column | — |
| labels | Tags column | — |
| token_estimate | Numbers column | "Est. Tokens" |

**Embedded:** Markdown in Updates or Long Text column.

Expand All @@ -198,22 +210,101 @@ for GitHub Issues since it has no custom field support.
| agent_rounds | Description section | no custom fields in free tier |
| human_review_minutes | Description section | — |
| labels | Labels | scoped labels supported |
| token_estimate | Description section | no custom fields in free tier |

**Embedded:** Markdown table in Description. Use `/estimate` quick action
for time tracking integration.

### Canonical → Asana

**Native:**
| Canonical Field | Asana Field | Notes |
|----------------|------------|-------|
| title | Task Name | — |
| complexity | Custom field (Dropdown) | "Size" — S/M/L/XL |
| committed_hours | Custom field (Number) | "Committed Estimate (hrs)" |
| pert_expected_hours | Custom field (Number) | "Expected (hrs)" |
| risk_level | Custom field (Dropdown) | "Risk" — low/medium/high |
| risk_notes | Description | appended |
| subtasks | Subtasks | native |
| agent_rounds | Custom field (Number) | "Agent Rounds" |
| human_review_minutes | Custom field (Number) | "Review (min)" |
| token_estimate | Custom field (Number) | "Est. Tokens" |

**Embedded:** Markdown in Description. Quirks: custom fields are
project-scoped; time tracking is paid.

### Canonical → Azure DevOps

**Native:**
| Canonical Field | ADO Field | Notes |
|----------------|----------|-------|
| title | Title | — |
| complexity | Tags | `Size:M` |
| committed_hours | Original Estimate | hours (native) |
| pert_expected_hours | Custom field (Decimal) | "Expected Estimate (hrs)" |
| risk_level | Tags | `Risk:medium` |
| risk_notes | Description | HTML — use `<table>` |
| subtasks | Child work items | parent-child link |
| agent_rounds | Custom field (Integer) | "Agent Rounds" |
| story_points | Story Points | native on User Story |
| token_estimate | Custom field (Integer) | "Est. Tokens" |

**Embedded:** HTML table in Description (ADO uses HTML, not markdown).
Quirks: custom fields via Process customization; work item types matter
(User Story vs Task).

### Canonical → Zenhub

**Native:**
| Canonical Field | Zenhub Field | Notes |
|----------------|-------------|-------|
| title | Issue Title | GitHub Issue title |
| complexity | Label | `size/M` (GitHub label) |
| committed_hours | Estimate | Zenhub story points field |
| pert_expected_hours | Body section | no custom fields |
| risk_level | Label | `risk/medium` (GitHub label) |
| risk_notes | Body | — |
| subtasks | Task list | `- [ ]` in body, or child issues |
| agent_rounds | Body section | no custom fields |
| story_points | Estimate | native Zenhub field (points) |
| token_estimate | Body section | no custom fields |

**Embedded:** Markdown in GitHub Issue body (recommended). Quirks: Zenhub
layers on top of GitHub Issues — uses GitHub labels + body for most data;
Estimate field is points-only; Epics are cross-repo issue collections.

### Canonical → Shortcut

**Native:**
| Canonical Field | Shortcut Field | Notes |
|----------------|---------------|-------|
| title | Story Name | — |
| complexity | Label | `size:M` |
| committed_hours | Custom field (Number) | "Committed (hrs)" |
| pert_expected_hours | Custom field (Number) | "Expected (hrs)" |
| risk_level | Label | `risk:medium` |
| risk_notes | Description | markdown supported |
| subtasks | Tasks (within Story) | checklist-style |
| agent_rounds | Custom field (Number) | "Agent Rounds" |
| story_points | Estimate | native field (points) |
| token_estimate | Custom field (Number) | "Est. Tokens" |

**Embedded:** Markdown in Description. Quirks: custom fields on Team plan+;
native Estimate is points not hours; Stories have Tasks (checklist items).

## Batch Output Format

### Summary Table (Always First)

```
| # | Task | Size | Type | Rounds | Agent | Human | Expected | Committed (80%) | Risk | Deps |
|---|------|------|------|--------|-------|-------|----------|-----------------|------|------|
| 1 | Auth service | M | coding | 10-26 | 20-78m | 2-3h | ~4h | ~5.5h | med | — |
| 2 | Payment | L | coding | 26-65 | 52-195m | 4-8h | ~8h | ~11h | high | #1 |
| 3 | DB migration | L | data-mig | 26-65 | 52-195m | 4-8h | ~16h | ~22h | high | — |
|---|------|------|------|--------|-------|-------|----------|-----------------|------|------|
| | **Totals** | | | | | | **~28h** | **~38.5h** | | |
| # | Task | Size | Type | Rounds | Agent | Human | Tokens | Expected | Committed (80%) | Risk | Deps |
|---|------|------|------|--------|-------|-------|--------|----------|-----------------|------|------|
| 1 | Auth service | M | coding | 10-26 | 20-78m | 2-3h | ~180k | ~4h | ~5.5h | med | — |
| 2 | Payment | L | coding | 26-65 | 52-195m | 4-8h | ~520k | ~8h | ~11h | high | #1 |
| 3 | DB migration | L | data-mig | 26-65 | 52-195m | 4-8h | ~520k | ~16h | ~22h | high | — |
|---|------|------|------|--------|-------|-------|--------|----------|-----------------|------|------|
| | **Totals** | | | | | | **~1.2M** | **~28h** | **~38.5h** | | |
```

### Rollup Block
Expand Down
9 changes: 9 additions & 0 deletions references/questionnaire.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,14 @@ All quick-path questions, plus:
- Enterprise (50+ people) — formal review, compliance, multi-team coordination (1.3x)
13. **Dependencies**: "Is this blocked by or blocking other tasks?"
→ dependency graph for sequencing
14. **Model & cost**: "Which model tier are you using, and do you want cost estimates?"
→ `model_tier`, `show_cost`
- Economy (Haiku, GPT-4o Mini, Gemini Flash) — cheapest
- Standard (Sonnet, GPT-4o, Gemini 2.5 Pro) — default
- Premium (Opus, GPT-5) — most capable
- Or name a specific model from the pricing table
- Show cost: yes/no (default: no)
- If user names a specific model, map to its tier for the formula

## Detailed Path — Batch

Expand Down Expand Up @@ -159,6 +167,7 @@ User can mark overrides or approve the whole table at once.
| Definition phase | spread_multiplier | ready (1.0x) | asked |
| Organization context | org_overhead | solo-startup (1.0x) | asked |
| Dependencies | sequencing | none | asked |
| Model & cost | model_tier, show_cost | standard, false | asked |

## Input Formats Accepted (Batch)

Expand Down
Loading
Loading