Status: Phase 4 spike (single-agent rollout)
File: governance/model-routing.json
trace_id: 7d3f5a2e-1b4c-4e9f-9a8b-2c5d8e1f3a7b
Decouple agent definitions from hard-pinned model strings. Today every *.agent.md carries a literal model: <vendor model> (copilot) line. Swapping a model — for cost, capability, or availability reasons — requires editing 13 files. Model routing introduces a logical-role indirection so agents reference what kind of model they need rather than a specific build.
governance/model-routing.json is the canonical source of truth. It declares 10 logical roles covering all 13 ControlFlow agents, with the following per-role shape:
{
"primary": "<model string>",
"fallbacks": ["<same-family alt>", "<cross-family alt>"],
"cost_tier": "low | medium | high",
"latency_tier": "fast | medium | slow",
"consumers": ["<agent-file.agent.md>", ...]
}The 10 roles are:
| Role | Consumers |
|---|---|
orchestration-capable |
Orchestrator |
capable-planner |
Planner |
capable-implementer |
CoreImplementer, PlatformEngineer |
ui-implementer |
UIImplementer |
documentation |
TechnicalWriter |
capable-reviewer |
CodeReviewer, PlanAuditor, AssumptionVerifier |
review-readonly |
ExecutabilityVerifier |
browser-testing |
BrowserTester |
fast-readonly |
CodeMapper |
research-capable |
Researcher |
While internal subagent dispatch resolves models dynamically, top-level user entry points (handling the initial chat request) execute using the literal model: frontmatter value. The operating mode ensures premium requests are spent on agents that decide quality at the control plane:
Planneris pinned toGPT-5.5so plan quality, decomposition, and risk framing use the strongest available planning model.CodeReviewer,PlanAuditor, andAssumptionVerifierrely on premium adversarial reviewers (Claude Opus 4.7).ExecutabilityVerifier,Orchestrator, and the implementation agents typically resolve to cheaper defaults (Claude Sonnet 4.6,GPT-5.4,GPT-5.4 mini,Gemini 3.1 Pro) to contain premium usage.
This yields a pragmatic split:
- Premium tokens are spent on planning and on finding flaws.
- Routine orchestration and implementation stay cheaper, managed through dynamic subagent dispatch logic.
During the rollout window, agents add a model_role: line to their frontmatter alongside the existing model: line:
---
description: '...'
tools: [...]
model: GPT-5.4 mini (copilot)
model_role: browser-testing
---Both lines coexist. The model: line is what VS Code Copilot currently consumes; model_role: is the logical-layer indirection validated by evals.
VS Code Copilot defaults to reading the literal model: value from frontmatter. However, within ControlFlow, prompt-driven runtime resolution is active for subagent dispatch.
When Orchestrator or Planner dispatch a subagent via agent/runSubagent, they actively execute model resolution:
- They load
governance/model-routing.json. - They look up the target agent in
agent_role_index. - They apply the
by_tiercomplexity rule to determine the required model string. - They pass the resolved
primarymodel explicitly as themodelparameter toagent/runSubagent, overriding the agent's frontmatter at call time.
While global VS Code Copilot execution (e.g., triggering an agent directly from chat) still relies on the frontmatter fallback, all internal orchestrated pipeline dispatches strictly enforce the logical routing graph dynamically.
The by_tier object describes model overrides based on the complexity tier of the task (TRIVIAL, SMALL, MEDIUM, LARGE). Because internal control plane logic resolves this matrix dynamically during subagent dispatch, this is an active runtime switch for Orchestrator and Planner.
Each key corresponds to a complexity tier, and its value is either a full override ({primary, fallbacks, cost_tier, latency_tier}) or {inherit_from: "default"}.
The resolution rule for a given role and tier is:
resolve(role, tier) = by_tier[tier] === {inherit_from: "default"} ? role.primary/fallbacks : by_tier[tier]
For example, capable-planner at TRIVIAL complexity might use a faster model like Sonnet:
"by_tier": {
"TRIVIAL": {
"primary": "Claude 3.5 Sonnet",
"fallbacks": ["GPT-4o mini"],
"cost_tier": "low",
"latency_tier": "fast"
},
"LARGE": {
"inherit_from": "default"
}
}At LARGE complexity, it inherits the default (which might be Opus).
The prompt-driven runtime resolution module (Orchestrator/Planner dynamic lookup) is complete.
Remaining prerequisites for Stage D (auto-tuning observability):
- (a) accumulate ≥50 task telemetry entries via the NDJSON sink at
plans/artifacts/observability/ - (b) expand
governance/model-routing.jsonschema withinherit_fromtargets beyond"default"(e.g., other roles or tier mixes)
- Phase 1 spike artifact:
plans/artifacts/model-routing-stage-c/phase-1-spike-result.md - Validation helper:
evals/drift-checks.mjs→validateByTierShape
| Tier | cost_tier |
latency_tier |
|---|---|---|
low |
Inexpensive per-call; suitable for high-volume read-only or smoke tasks. | Sub-second to a few seconds typical first-token. |
medium |
Mid-range per-call; default for implementer and review-readonly work. | A few seconds typical first-token. |
high |
Expensive per-call; reserve for planning, deep review, or research. | slow — multi-second first-token; long completions expected. |
These tiers are advisory and intended to inform future cost-aware routing (Phase 8+).
fallbacks lists alternate models in preferred order, used when the primary is unavailable (rate-limited, capability-gated, or model removed from Copilot):
- The first fallback is typically a same-family alternative (e.g., Claude Sonnet 4.6 → Claude Opus 4.7) preserving prompt compatibility.
- The second is a cross-family alternative (e.g., Claude → GPT) accepting potentially larger behavior shifts in exchange for availability.
Fallback resolution is not runtime-enforced today; the list documents the intended chain so future routing logic can implement it deterministically without re-deriving safe substitutions.
reasoning_effort_hint is an advisory-only metadata field added per-role as a sibling of primary, fallbacks, cost_tier, latency_tier, and consumers.
low | medium | high
- Consumers MAY use this hint to bias per-call reasoning effort (e.g., number of thinking tokens, chain-of-thought depth).
- Consumers MUST ignore it safely if the value is unrecognized or if the underlying runtime does not support effort control.
- The field is NOT passed through the delegation protocol and is NOT enforced at runtime.
The field lives at the per-role level, as a sibling of primary, fallbacks, cost_tier, latency_tier, and consumers. It is not placed inside by_tier sub-objects or consumers arrays.
"capable-planner": {
"primary": "...",
"fallbacks": [...],
"cost_tier": "high",
"latency_tier": "slow",
"consumers": [...],
"reasoning_effort_hint": "high",
"by_tier": { ... }
}- Repository agent-engineering index:
docs/agent-engineering/README.md(authored in Phase 10). - Drift detection:
evals/validate.mjsand the upcomingevals/scenarios/model-routing-alignment.json. - Plan:
plans/controlflow-comprehensive-revision-plan.mdPhase 4. - Spike record:
plans/artifacts/model-resolver/phase-1-spike.md.