Skip to content

P2: Add thinking level as estimation parameter #78

@haoranc

Description

@haoranc

Summary

All calibration data (83+ validated dispatches) was collected with elevated thinking levels:

  • Claude Code: high thinking
  • Codex: extra high thinking

This is not captured anywhere in the estimation model. Thinking level affects:

  • Cost per turn: High thinking uses significantly more tokens per turn (~2x for Opus)
  • Turns needed: Higher thinking = fewer iterations (gets it right first time)
  • Duration per turn: Higher thinking = longer turns (~2-3 min vs ~1-1.5 min at default)
  • Quality: Consistently Q4-Q5 scores may partly be because of elevated thinking

Proposed Changes

1. New --thinking-level CLI parameter

ae estimate --thinking-level high "Add user authentication"
ae estimate --thinking-level default "Add user authentication"  # lower cost, more turns

Values: default, high, extra-high

2. Thinking level modifier in PERT engine

Thinking Level Cost/Turn Modifier Turns Modifier Net Duration
Default 1.0x 1.0x baseline
High 1.5-2.0x 0.7-0.8x ~1.2-1.6x cost, ~0.8x duration
Extra High 2.0-2.5x 0.6-0.7x ~1.5-1.8x cost, ~0.7x duration

3. Document in calibration data

Note which thinking level each calibration entry was collected at. Current entries are all high/extra-high.

4. Default behavior

If --thinking-level is not specified, default to high (matches calibration baseline). Flag in output: "Estimated at [thinking level] — calibration baseline."

Context

The Iris /estimate skill has been updated to document this gap. The agents.md memory file now records default thinking levels per agent. This issue tracks the product-side implementation.

Priority

P2 — important for accuracy but existing estimates are implicitly correct for high/extra-high users (which is our primary use case). Becomes P1 if/when external users with different thinking configurations report estimate drift.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions