diff --git a/docker-compose.yml b/docker-compose.yml
index 153cc2be..2adb82bf 100644
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -24,6 +24,9 @@
 #     POSTGRES_PASSWORD=ergon \
 #     TEST_HARNESS_SECRET=real-llm-secret \
 #     OPENROUTER_API_KEY="$OPENROUTER_API_KEY" \
+#     OPENAI_API_KEY="$OPENAI_API_KEY" \
+#     EXA_API_KEY="$EXA_API_KEY" \
+#     HF_API_KEY="$HF_API_KEY" \
 #     docker compose up -d
 #
 # Observability stack (otel + jaeger) on demand:
@@ -88,6 +91,9 @@ services:
       - OTEL_SERVICE_NAME=ergon-core
       - E2B_API_KEY=${E2B_API_KEY:-}
       - OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-}
+      - OPENAI_API_KEY=${OPENAI_API_KEY:-}
+      - EXA_API_KEY=${EXA_API_KEY:-}
+      - HF_API_KEY=${HF_API_KEY:-}
       # Put /app on sys.path so editable source mounts resolve in the API
       # container while the smoke fixtures live in ergon_core.test_support.
       - PYTHONPATH=/app
diff --git a/docs/architecture/03_providers.md b/docs/architecture/03_providers.md
index 7a957547..89d9a90a 100644
--- a/docs/architecture/03_providers.md
+++ b/docs/architecture/03_providers.md
@@ -8,9 +8,7 @@ The providers layer is Ergon's boundary between runtime code and external execut
 
 | Name | Kind | Location | Freeze status | Owner |
 | --- | --- | --- | --- | --- |
-| `_BACKEND_REGISTRY` | module-level dict | `ergon_core/core/providers/generation/model_resolution.py` | Frozen shape; entries grow via registration. | Providers layer. |
 | `resolve_model_target` | function | `ergon_core/core/providers/generation/model_resolution.py` | Public, frozen signature. Returns `ResolvedModel`. | Providers layer. |
-| `register_model_backend` | function | `ergon_core/core/providers/generation/model_resolution.py` | Public, frozen signature. | Providers layer; callers are backend modules executing at import time. |
 | `BaseSandboxManager` | abstract class + singleton | `ergon_core/core/providers/sandbox/manager.py` | Shape stable; `event_sink` activation path in flux. | Providers layer. |
 | `DefaultSandboxManager` | concrete class | `ergon_core/core/providers/sandbox/manager.py` | Frozen. | Providers layer. |
 | `SWEBenchSandboxManager`, `MiniF2FSandboxManager`, `ResearchRubricsSandboxManager` | concrete subclasses | `ergon_builtins/` | Owned per benchmark; singletons. | Benchmark authors. |
@@ -19,11 +17,11 @@ The providers layer is Ergon's boundary between runtime code and external execut
 | `SandboxResourcePublisher` | class | `ergon_core/core/providers/sandbox/resource_publisher.py` | Frozen API; storage backend swappable via `ERGON_BLOB_ROOT`. | Providers layer. |
 | `TransformersModel` | `pydantic_ai.models.Model` subclass | `ergon_builtins/ergon_builtins/models/transformers_backend.py` | Frozen. | ML team (TRL training loop callers). |
 
-### 2.1 Generation registry
+### 2.1 Model target resolution
 
-`_BACKEND_REGISTRY` is a prefix-keyed dispatch table of resolver callables. `resolve_model_target` splits the target on its first colon, dispatches to the resolver, and returns a `ResolvedModel` wrapping either a `pydantic_ai.models.Model` instance or a passthrough string. Unknown prefixes fall through to a passthrough `ResolvedModel` — PydanticAI's own `infer_model` is invoked on use. Backends mutate the registry at import time; the builtins pack registers all four in a single loop at `ergon_builtins/ergon_builtins/registry.py:81`.
+`resolve_model_target` is the single dispatch point for model target strings. It splits the target on its first colon and returns a `ResolvedModel` wrapping a concrete `pydantic_ai.models.Model` instance. Unknown prefixes raise immediately instead of falling through to PydanticAI inference.
 
-The four prefixes registered today are `vllm:*` (local vLLM server via PydanticAI's `OpenAIChatModel`), `openai:*` / `anthropic:*` / `google:*` (passthrough to `infer_model`), and `transformers:*` (custom `TransformersModel` for TRL-trained checkpoints not served over vLLM).
+The supported prefixes are `vllm:<base-url>[#<model>]`, `openai-compatible:<base-url>#<model>`, and cloud provider prefixes `openai:*` / `anthropic:*` / `google:*`. Cloud provider prefixes always route through OpenRouter via PydanticAI's OpenRouter provider; they do not call direct OpenAI, Anthropic, or Google APIs.
 
 Workers are expected to hold no hardcoded SDK client constructions (`AsyncOpenAI`, `anthropic.Client`, `genai.Client`). This is an invariant (Section 4), not a coincidence, and is currently honored — enforcement is grep discipline.
 
@@ -87,7 +85,7 @@ The decentralized shape means `ergon benchmark setup` iterates over whatever sub
 Worker.execute()
     |
     +-> resolve_model_target(self.model)  -->  ResolvedModel
-    |       (prefix dispatch; 4 backends + fallthrough to infer_model)
+    |       (explicit prefix dispatch; cloud targets route via OpenRouter)
     |
     +-> ManagerClass()                    (singleton; returns cached instance)
     |   ManagerClass().create(sandbox_key=task_id, run_id=run_id, ...)
@@ -126,7 +124,7 @@ Movement of data across this diagram:
 ## 4. Invariants
 
 1. **One entry point to LLM resolution.** Every model reference goes through `resolve_model_target`. Enforced by grep discipline and review; no runtime check.
-2. **Backends register at import time.** `register_model_backend` must be called before any caller hits `resolve_model_target`. Enforced by the builtins pack running its registration loop at import, before any worker module imports.
+2. **Cloud provider prefixes use OpenRouter.** `openai:*`, `anthropic:*`, and `google:*` model targets are OpenRouter-hosted targets. Direct cloud SDK model routing is intentionally outside the grammar.
 3. **Singleton managers hold authoritative sandbox state.** A subclass's class-level state is the only source of truth for in-process reconnect. Enforced by `__new__` caching the instance and `get_sandbox` reading the class dict. Applies only within a single Python process; cross-process actors must use `terminate_by_sandbox_id` or provision their own sandbox.
 4. **Sandbox lifecycle is per-task.** Enforced by `create` accepting `sandbox_key` and by the worker runtime persisting `sandbox_id` on the execution row.
 5. **Sandbox lives across evaluator fan-out.** Teardown runs at the end of `check_evaluators`, not at worker completion, not in `finalize_success`. Enforced by the evaluator harness, not by the manager itself.
@@ -146,10 +144,9 @@ Movement of data across this diagram:
 
 ### 5.1 Add a new LLM backend
 
-1. Write a resolver that maps `"myprefix:foo"` to a `pydantic_ai.models.Model` instance wrapped in `ResolvedModel`.
-2. Register it in the builtins-pack registration loop so `register_model_backend` is called at import time.
-3. Ensure the builtins pack is imported before any worker that references `myprefix:*` model ids.
-4. Add an entry to `LLMProvider` and `PROVIDER_KEY_MAP` in `ergon_cli/onboarding/profile.py` so onboarding prompts for the key or server URL.
+1. Add an explicit prefix branch in `resolve_model_target` and keep the constructor logic in a sibling module under `ergon_core/core/providers/generation/`.
+2. Return a concrete `pydantic_ai.models.Model` instance wrapped in `ResolvedModel`.
+3. Add an entry to `LLMProvider` and `PROVIDER_KEY_MAP` in `ergon_cli/onboarding/profile.py` so onboarding prompts for the key or server URL.
 
 ### 5.2 Add a new sandbox manager
 
diff --git a/docs/architecture/06_builtins.md b/docs/architecture/06_builtins.md
index b7c082fe..43073cba 100644
--- a/docs/architecture/06_builtins.md
+++ b/docs/architecture/06_builtins.md
@@ -52,10 +52,11 @@ runnable — not a catalog of registered implementations.
     Rubric nesting is not supported and there are no plans to change that.
   - Third-party users primarily extend at the Criterion layer.
 
-- Model backend registry.
-  - Concrete LLM backends register via
-    `register_model_backend(prefix, resolver)` at import time.
-  - Freeze status: stable API; adding a backend is additive.
+- Model target resolution.
+  - Builtins do not register cloud model backends. Model target strings are
+    resolved centrally by `resolve_model_target` in `ergon_core`.
+  - Freeze status: stable API; adding a backend is additive inside the
+    providers layer.
 
 - ReAct toolkit composition.
   - There is one concrete ReAct worker class — `ReActWorker` (slug `react-v1`,
@@ -145,8 +146,8 @@ Benchmark loader → Task instances → Worker
 - **New worker.** Add under `ergon_builtins/workers/baselines/` if it is
   cross-benchmark; alongside the benchmark otherwise. The contract is which
   task schemas it supports.
-- **New model backend.** Call `register_model_backend(prefix, resolver)` at
-  import time; prefer short, stable prefixes.
+- **New model backend.** Add an explicit `resolve_model_target` branch in
+  `ergon_core/core/providers/generation/`; prefer short, stable prefixes.
 - **New Criterion.** Place in `ergon_builtins/evaluators/criteria/` if
   reusable, alongside the benchmark if benchmark-specific. This is the
   layer third-party users most often extend.
diff --git a/docs/experiments/rq1-cli-specialism/changelog.md b/docs/experiments/rq1-cli-specialism/changelog.md
new file mode 100644
index 00000000..11cb5f91
--- /dev/null
+++ b/docs/experiments/rq1-cli-specialism/changelog.md
@@ -0,0 +1,297 @@
+# RQ1 CLI Specialism Overnight Changelog
+
+## Goal
+
+Use the PR #39 workflow-CLI ResearchRubrics agent to produce rollout-card artifacts that support RQ1: returns remain a useful guardrail, but rollout cards preserve richer delegation and role-specialism behaviour that scalar returns discard.
+
+## 2026-04-26 23:30 UTC+1 - Preflight
+
+- Worktree: `/Users/charliemasters/Desktop/synced_vm_002/ergon/.worktrees/feature/finish-agent-workflow-cli`
+- Branch: `feature/finish-agent-workflow-cli`
+- PR: https://github.com/DeepFlow-research/ergon/pull/39
+- Commit at start: `ae7a0a8 Finish agent workflow CLI task editing`
+- PR checks: all current checks passing by `gh pr checks 39`:
+  - `Integration tests (Python)`: pass
+  - `Lint + type-check (Frontend)`: pass
+  - `Lint + type-check (Python)`: pass
+  - `Unit tests (Python)`: pass
+  - `smoke [minif2f]`: pass
+  - `smoke [researchrubrics]`: pass
+  - `smoke [swebench-verified]`: pass
+- Local `.env`: not present in the PR worktree. Real-LLM commands source `/Users/charliemasters/Desktop/synced_vm_002/ergon/.env` without copying it.
+- Required keys after sourcing main `.env`: `OPENROUTER_API_KEY`, `EXA_API_KEY`, and `E2B_API_KEY` are set.
+- Local services:
+  - `docker compose ps` in the worktree showed no compose-owned services.
+  - `http://127.0.0.1:3001/` responded.
+  - `http://127.0.0.1:9000/` responded with HTTP 404, which still indicates a process is listening; harness fixture treats connection success as stack-up.
+
+## Run Log
+
+Runs append below. Each entry should include command, env knobs, rollout artifact path, run ID, terminal status, score notes, graph/subtask notes, and prompt/config changes.
+
+## 2026-04-26 23:36 UTC+1 - Preflight Smoke Blocker
+
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 uv run pytest tests/real_llm/benchmarks/test_smoke_stub.py -v -s --assume-stack-up`
+- Result:
+  - Failed during test collection before any benchmark/model spend.
+- Root cause:
+  - `telemetry.models` imports `ergon_core.api.json_types`, which executes `ergon_core.api.__init__`.
+  - `ergon_core.api.__init__` eagerly imported `RunResourceView` from `api.run_resource`.
+  - `api.run_resource` imports `RunResourceKind` from `telemetry.models` while `telemetry.models` is partially initialized.
+- Fix:
+  - Added `tests/unit/runtime/test_import_boundaries.py` as a regression.
+  - Changed `ergon_core/ergon_core/api/__init__.py` to lazily expose `RunResourceKind` and `RunResourceView` via `__getattr__`.
+- Verification:
+  - `uv run pytest tests/unit/runtime/test_import_boundaries.py -q` -> `1 passed`
+  - `uv run ruff format ergon_core/ergon_core/api/__init__.py tests/unit/runtime/test_import_boundaries.py && uv run ruff check ergon_core/ergon_core/api/__init__.py tests/unit/runtime/test_import_boundaries.py` -> `All checks passed`
+- Commit:
+  - `e23c276 Fix run resource API import boundary`
+
+## 2026-04-26 23:45 UTC+1 - Stack Rebuild
+
+- Rebuilt the shared `ergon` compose project from the PR #39 worktree:
+  - `COMPOSE_PROJECT_NAME=ergon docker compose up -d --build --wait`
+- Reason:
+  - The running stack was built before PR #39, so the API/Inngest runtime might not know `researchrubrics-workflow-cli-react`.
+- Result:
+  - `ergon-api-1`, `ergon-dashboard-1`, `ergon-inngest-dev-1`, and `ergon-postgres-1` are running.
+  - API root returns HTTP 404 but the process is reachable; the real-LLM fixture only requires connection success.
+
+## 2026-04-26 23:47 UTC+1 - Baseline Workflow-CLI Batch 1
+
+- Intent:
+  - Run 5 ResearchRubrics samples with the current PR #39 workflow-CLI prompt.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Failed after creating run `2626bae9-b058-4b1b-9803-8e6186468023`.
+- Failure:
+  - Harness endpoint `GET /api/test/read/run/2626bae9-b058-4b1b-9803-8e6186468023/state` returned HTTP 500.
+  - Local DB/API inspection showed `psycopg2.errors.UndefinedColumn: column run_resources.copied_from_resource_id does not exist`.
+- Root cause:
+  - The long-lived local Postgres DB was stamped at Alembic head `0a1b2c3d4e5f`, but was missing the already-existing migration `a2b3c4d5e6f7_add_copied_from_resource_id.py` effect. This is local schema drift, not a missing migration in the branch.
+- Local repair:
+  - Applied idempotent local DDL:
+    - `ALTER TABLE run_resources ADD COLUMN IF NOT EXISTS copied_from_resource_id UUID NULL`
+    - `CREATE INDEX IF NOT EXISTS ix_run_resources_copied_from_resource_id ON run_resources (copied_from_resource_id)`
+    - Add FK constraint `fk_run_resources_copied_from_resource_id_run_resources` if absent.
+  - Verification: information schema now reports one `copied_from_resource_id` column.
+- Post-repair canary:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 uv run pytest tests/real_llm/benchmarks/test_smoke_stub.py -v -s --assume-stack-up`
+  - Result: `1 passed` in 27.15s.
+
+## 2026-04-26 23:45 UTC+1 - Baseline Workflow-CLI Batch 1b
+
+- Intent:
+  - Retry 5 ResearchRubrics samples after local schema repair.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Passed, but not useful for headline RQ1 evidence.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T224530Z-3caf7e5c-e09f-47a8-8afb-58fd2693b761/`
+  - Run ID: `3caf7e5c-e09f-47a8-8afb-58fd2693b761`
+  - Wall clock: 235.6s
+  - Budget: $0.477609
+- Findings:
+  - The hardcoded `researchrubrics` benchmark loaded only 2 private/default smoke rows: `smoke-001`, `smoke-002`.
+  - Graph had 2 root nodes, 0 edges, 0 child subtasks, 2 resources, 1 evaluation.
+  - Worker did call `workflow inspect task-tree` once per task, but did not spawn/coordinate specialist subtasks.
+  - Evaluator returned score 0.0 because the API container did not have `OPENAI_API_KEY`.
+- Fixes after analysis:
+  - `ResearchRubricsBenchmark._payload_from_row` now accepts vanilla dataset rows with `prompt` when `ablated_prompt` is absent.
+  - `tests/real_llm/benchmarks/test_researchrubrics.py` now honors `ERGON_REAL_LLM_BENCHMARK`, defaulting to `researchrubrics`.
+  - `docker-compose.yml` now passes `OPENAI_API_KEY`, `EXA_API_KEY`, and `HF_API_KEY` to the API container alongside the existing E2B/OpenRouter keys.
+  - Focused tests: `uv run pytest tests/unit/state/test_research_rubrics_benchmark.py -q` -> `10 passed`.
+  - Vanilla load check: `ResearchRubricsVanillaBenchmark(limit=5)` -> 5 rows loaded.
+  - Stack rebuilt with exported env; API container verified all provider keys present.
+
+## 2026-04-27 00:00 UTC+1 - Vanilla 5-Sample Workflow-CLI Batch 1
+
+- Intent:
+  - Run the actual 5-row ScaleAI ResearchRubrics benchmark after enabling vanilla rows and backend evaluator env.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Run reached terminal `failed`, but pytest timed out waiting for resources/evaluations because all tasks failed before report persistence.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T230154Z-ab57a0df-2a6d-4174-95f5-87185f717707/`
+  - Run ID: `ab57a0df-2a6d-4174-95f5-87185f717707`
+  - Row counts: 5 graph nodes, 0 graph edges, 25 mutations, 121 context events, 10 sandbox events, 0 resources, 0 evaluations.
+- Findings:
+  - This was the intended 5 real-row ScaleAI benchmark: five sample IDs were created.
+  - Behavior was rich but not successful: 116 tool calls total, including 111 `exa_search` and 5 `workflow inspect task-tree`.
+  - No task called `write_report_draft` or `final_result`; all failed with generic `Worker execution failed`.
+  - Failure mode appears to be search-budget exhaustion / max-iteration behavior on large vanilla rubrics, not missing provider keys.
+  - No child subtasks: the workflow tool was available but graph editing was not manager-capable, and the prompt only suggested inspection/resource-copying.
+- Core/harness fixes:
+  - `_wait_for_post_terminal_artifacts` now returns for terminal `failed`/`cancelled` runs with no running executions, so failed-before-output rollouts still dump artifacts.
+  - `_require_keys` now includes `openai_api_key`.
+  - Broke a context-event import cycle by storing context `turn_logprobs` as open JSON payloads instead of importing `TokenLogprob` from `ergon_core.api.generation`.
+  - Added import-boundary coverage for context models.
+  - Tests: `uv run pytest tests/unit/runtime/test_import_boundaries.py tests/unit/state/test_research_rubrics_benchmark.py -q` -> `12 passed`.
+
+## 2026-04-27 00:04 UTC+1 - Prompt Hillclimb Variant 1
+
+- Prompt/tool changes:
+  - Workflow-CLI ReAct worker now passes `manager_capable=True` to `make_workflow_cli_tool`.
+  - Prompt asks level-0 tasks to create exactly three specialist child tasks before research:
+    - source scout
+    - rubric compliance checker
+    - synthesis reviewer
+  - Prompt tells non-root tasks not to create recursive children.
+  - Prompt caps own work to at most 6 `exa_search` calls before writing `final_output/report.md`.
+- Verification:
+  - `uv run pytest tests/unit/state/test_research_rubrics_workers.py tests/unit/state/test_workflow_cli_tool.py -q` -> `10 passed`.
+  - API restarted; provider keys still present in container.
+- Next run command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Diagnostic run:
+  - Run ID: `4d3721d0-aacb-4f04-bea9-9217c0549f9e`
+  - Stopped pytest manually after confirming it was polluted by the async workflow bridge bug.
+  - Positive signal: at least one root task attempted the desired `workflow manage add-task` specialist pattern before searching:
+    - source scout
+    - rubric compliance checker
+    - synthesis reviewer
+  - Bug found: agent-side workflow manage commands called the sync CLI bridge, which used `asyncio.run()` inside an already-running event loop. API log showed `RuntimeWarning: coroutine '_handle_manage' was never awaited`.
+- Core fix:
+  - Added `execute_workflow_command_async(...)` in `ergon_cli.commands.workflow`.
+  - `execute_workflow_command(...)` now remains a sync wrapper for CLI callers.
+  - `make_workflow_cli_tool(...)` now awaits the async executor.
+  - Tests: `uv run pytest tests/unit/cli/test_workflow_cli.py tests/unit/state/test_workflow_cli_tool.py -q` -> `10 passed`.
+
+## 2026-04-27 00:13 UTC+1 - Prompt Hillclimb Variant 1b
+
+- Intent:
+  - Re-run Variant 1 with the fixed async workflow bridge.
+- Status:
+  - Cancelled after diagnostic success and provider failures.
+- Diagnostic result:
+  - Run ID: `9a83787a-dac2-45a1-9d3f-823f65984716`
+  - Early poll showed 20 graph nodes: 5 roots + 15 level-1 specialist children.
+  - Each root created source scout, rubric compliance, and synthesis reviewer children.
+  - This is the desired RQ1 graph-specialism signal.
+  - However, several roots failed on provider/schema errors (`finish_reason=None`) before reports/evaluations landed; remaining children were pending/blocked.
+  - Cancelled run via `uv run ergon run cancel 9a83787a-dac2-45a1-9d3f-823f65984716`.
+
+## 2026-04-27 00:22 UTC+1 - Prompt Hillclimb Variant 1c
+
+- Intent:
+  - Keep the specialist-subtask prompt, but switch from OpenRouter Sonnet to direct OpenAI to avoid the `finish_reason=None` OpenRouter/PydanticAI failure.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_MODEL=openai:gpt-4o-mini ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Cancelled after partial artifact dump.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T232803Z-3b258073-ab38-4a22-ac18-766c27d8aa1e/`
+  - Run ID: `3b258073-ab38-4a22-ac18-766c27d8aa1e`
+  - Row counts: 11 graph nodes, 37 mutations, 90 context events, 3 resources, 2 evaluations.
+- Findings:
+  - Direct OpenAI avoided the OpenRouter `finish_reason=None` issue.
+  - Two root tasks completed and produced evaluations:
+    - score `0.11382113821138211`, passed `true`
+    - score `0.014084507042253521`, passed `false`
+  - Three roots failed before final output; two failed roots created specialist children, which were blocked by parent failure.
+  - This is a partial "rich behavior vs return" data point: returns are low/partial, but rollout-card structure exposes role-specialist decomposition not captured by scalar return.
+
+## 2026-04-27 00:29 UTC+1 - Prompt Hillclimb Variant 1d
+
+- Intent:
+  - Same specialist prompt, direct OpenAI, stronger model (`openai:gpt-4o`) to improve returns while preserving graph-specialism signal.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_MODEL=openai:gpt-4o ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Cancelled after partial artifact dump because dynamic child tasks remained pending.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T233740Z-356b7189-229b-4ef4-849c-f3c87964feb4/`
+  - Run ID: `356b7189-229b-4ef4-849c-f3c87964feb4`
+  - Row counts: 20 graph nodes, 43 mutations, 62 context events, 5 resources, 4 evaluations.
+- Findings:
+  - Best evidence so far: 5 roots, 15 specialist children, 4/5 root reports completed, 4 evaluations landed.
+  - Scores:
+    - `0.1267605633802817`, passed `true`
+    - `0.11382113821138211`, passed `true`
+    - `0.07142857142857142`, passed `false`
+    - `0.0`, passed `false`
+  - Dynamic children remained `pending` rather than being scheduled after creation.
+- Core fix:
+  - `WorkflowService.add_task` now emits `task/ready` for the created dynamic node after commit.
+  - Added an injectable task-ready dispatcher and unit test coverage.
+  - Tests: `uv run pytest tests/unit/runtime/test_workflow_service.py tests/unit/cli/test_workflow_cli.py tests/unit/state/test_workflow_cli_tool.py -q` -> `22 passed`.
+
+## 2026-04-27 00:38 UTC+1 - Prompt Hillclimb Variant 1e
+
+- Intent:
+  - Same GPT-4o specialist prompt, now with dynamic child scheduling fixed.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_MODEL=openai:gpt-4o ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Pytest artifact-dump wrapper passed, but run terminal status is `failed`.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T233920Z-0700b668-a640-49f2-80f9-a5c87bc160a9/`
+  - Run ID: `0700b668-a640-49f2-80f9-a5c87bc160a9`
+  - Row counts: 20 graph nodes, 70 mutations, 257 context events, 5 resources, 5 evaluations, 20 executions.
+  - Run summary: `final_score=0.7134802212615627`, `normalized_score=0.14269604425231255`, `evaluators_count=5`.
+- Findings:
+  - Scheduling fix worked: Inngest logs show dynamic `task/ready` events for child `node_id`s and `task-execute` initialized for those children.
+  - Graph-specialism signal preserved: 5 roots and 15 specialist children.
+  - Returns improved versus prior failed/partial variants: all 5 root tasks completed and all 5 root evaluations landed; 1/5 evaluations passed.
+  - Remaining failure mode: most specialist children started and then failed generically (`Worker execution failed`), causing the overall run to fail even though root reports/evaluations landed.
+  - Root cause for child recursion: the prompt told agents to inspect `task-tree`; child agents can see other level-0 roots in that output and at least one child incorrectly called `manage add-task`.
+- Prompt fix for next run:
+  - Delegation decision now uses only `workflow("inspect task-workspace --format json")` and `task_workspace.task.level`.
+  - Prompt explicitly says to ignore level-0 tasks shown elsewhere in task-tree.
+  - Non-root specialist children are told not to call `workflow("manage add-task`), to use at most 2 workflow inspections and 3 `exa_search` calls, and to write `final_output/report.md`.
+- Verification:
+  - Red test first: `uv run pytest tests/unit/state/test_research_rubrics_workers.py::TestResearcherWorker::test_workflow_cli_prompt_uses_current_task_level_for_delegation -q` failed on the missing `task-workspace --format json` instruction.
+  - Green tests: `uv run pytest tests/unit/state/test_research_rubrics_workers.py tests/unit/state/test_workflow_cli_tool.py -q` -> `11 passed`.
+
+## 2026-04-27 00:55 UTC+1 - Prompt Hillclimb Variant 1f
+
+- Intent:
+  - Same GPT-4o specialist prompt, but with delegation keyed to the current task workspace rather than global task-tree rows.
+- Command:
+  - `ERGON_REAL_LLM=1 ERGON_REAL_LLM_BUDGET_USD=50 ERGON_REAL_LLM_MODEL=openai:gpt-4o ERGON_REAL_LLM_BENCHMARK=researchrubrics-vanilla ERGON_REAL_LLM_WORKER=researchrubrics-workflow-cli-react ERGON_REAL_LLM_LIMIT=5 uv run pytest tests/real_llm/benchmarks/test_researchrubrics.py -v -s --assume-stack-up`
+- Status:
+  - Pytest artifact-dump wrapper passed, but run terminal status is `failed`.
+- Rollout:
+  - Directory: `tests/real_llm/.rollouts/20260426T234424Z-7fc055f5-03c3-4cab-8117-04e844696482/`
+  - Run ID: `7fc055f5-03c3-4cab-8117-04e844696482`
+  - Row counts: 20 graph nodes, 70 mutations, 235 context events, 5 resources, 5 evaluations, 20 executions.
+  - Run summary: `final_score=0.7597894539417135`, `normalized_score=0.1519578907883427`, `evaluators_count=5`.
+- Findings:
+  - Best overnight evidence so far.
+  - Graph-specialism signal: 5 roots created exactly 15 specialist children; `manage add-task` appears exactly 15 times and no recursive child creation was observed.
+  - Return guardrail: all 5 root tasks completed, all 5 root evaluations landed, and the aggregate normalized score improved slightly over 1e (`0.1519578907883427` vs `0.14269604425231255`).
+  - Specialist execution improved but remains noisy: 5/15 children completed, 10/15 failed with generic `Worker execution failed`, so the run-level status is still `failed`.
+  - This supports the RQ1 story: the scalar terminal status is poor, but the rollout card exposes a stable specialist-delegation pattern, role-specific child descriptions, root report completion, and recoverable child-worker behavior.
+- Backend harness endpoint check:
+  - `GET http://127.0.0.1:9000/api/test/read/run/7fc055f5-03c3-4cab-8117-04e844696482/state` returned HTTP 200 with `status=failed`, `graph_nodes=20`, `mutations=70`, `evaluations=5`, `executions=20`, `resource_count=5`, `context_event_count=235`.
+  - The same path on dashboard port `3001` returned 404; the harness route is backend API, not dashboard.
+
+## Morning Handoff Notes
+
+- Best variant: Prompt Hillclimb Variant 1f.
+- Best artifact path: `tests/real_llm/.rollouts/20260426T234424Z-7fc055f5-03c3-4cab-8117-04e844696482/`
+- Candidate RQ1 headline evidence:
+  - Returns/status alone: run is `failed`, but all 5 root tasks completed and all 5 evaluations landed.
+  - Rollout-card structure: 5 root tasks, exactly 15 specialist child tasks, 70 graph mutations, 235 context events.
+  - Specialism behavior: root tasks consistently decomposed into source-scout, rubric-checker/compliance, and synthesis-reviewer roles.
+  - Cross-community analysis hook: this single rollout card supports post-hoc role-diversity / worker-specialism measurements that are invisible in terminal status or scalar return.
+- Main residual issue:
+  - Dynamic specialist children now schedule and some complete, but child failures still propagate run failure. Next core-code direction would be either (a) make advisory child tasks non-fatal for parent benchmark return, or (b) harden child-worker prompting/tooling so specialist children reliably write `final_output/report.md`.
+
+## 2026-04-27 10:05 UTC+1 - Model Resolution Refactor
+
+- Intent:
+  - Make Ergon cloud model targets (`openai:*`, `anthropic:*`, `google:*`) route through OpenRouter instead of direct provider APIs or PydanticAI fallback inference.
+- Changes:
+  - Upgraded `pydantic-ai` from `0.7.2` to `0.8.1`, the latest resolvable version in this environment.
+  - Centralized dispatch in `ergon_core/core/providers/generation/model_resolution.py`.
+  - Added `openrouter.py` for OpenRouter-hosted cloud targets and `openai_compatible.py` for `vllm:` plus `openai-compatible:` endpoint targets.
+  - Removed the builtins model-backend registration path and the old `cloud_passthrough.py` / `vllm_backend.py` modules.
+- Note:
+  - The installed PydanticAI version exposes `OpenRouterProvider` but not `OpenRouterModel`; the implementation uses `OpenAIChatModel(..., provider=OpenRouterProvider(...))`, which gives the desired OpenRouter routing semantics.
+
+
diff --git a/docs/real-llm-rollout-harness.md b/docs/real-llm-rollout-harness.md
index ad49aead..d28cf49f 100644
--- a/docs/real-llm-rollout-harness.md
+++ b/docs/real-llm-rollout-harness.md
@@ -195,7 +195,7 @@ async def test_researchrubrics_rollout(
             "--evaluator", "research-rubric",
             "--model", os.environ.get(
                 "ERGON_REAL_LLM_MODEL",
-                "openrouter:anthropic/claude-sonnet-4.6",
+                "anthropic:claude-sonnet-4.6",
             ),
             "--limit", "1",
         ],
@@ -215,16 +215,13 @@ async def test_researchrubrics_rollout(
 
 ## Spike results
 
-**1. OpenRouter model routing — works out of the box.**
-`pydantic_ai.models.infer_model("openrouter:anthropic/claude-sonnet-4.6")`
-resolves to an `OpenAIModel` backed by
-`pydantic_ai.providers.openrouter.OpenRouterProvider`. The only
-requirement is `OPENROUTER_API_KEY` in the process env, which
-`settings.py:82-83` already exports from `settings.openrouter_api_key`.
-`resolve_model_target`'s fallback branch passes `openrouter:*` strings
-straight through to pydantic-ai. **No backend registration needed.** An
-optional one-line `"openrouter": resolve_cloud` entry in
-`MODEL_BACKENDS` is nice-to-have for symmetry, not required.
+**1. OpenRouter model routing.**
+`resolve_model_target("anthropic:claude-sonnet-4.6")` resolves to a
+PydanticAI chat model backed by
+`pydantic_ai.providers.openrouter.OpenRouterProvider`. Cloud provider
+prefixes (`openai:`, `anthropic:`, `google:`) are OpenRouter-hosted in
+Ergon; use `OPENROUTER_API_KEY` in the process env and do not route
+through direct provider APIs.
 
 **2. Exa inside the sandbox — confirmed not wired.** The plumbing
 exists but nothing populates it:
@@ -342,10 +339,9 @@ Files:
    `write_manifest`, `_rollout_dir`.
 2. `tests/real_llm/benchmarks/test_researchrubrics.py` — the 30-line
    trigger above.
-3. *(optional)* `ergon_builtins/registry_core.py` — one-line
-   `"openrouter": resolve_cloud` entry in `MODEL_BACKENDS` for
-   symmetry with the other cloud prefixes. Not required — pydantic-ai
-   handles the prefix natively via the fallback branch.
+3. Model targets resolve centrally in `resolve_model_target`; use
+   provider-prefixed targets such as `anthropic:claude-sonnet-4.6`.
+   Cloud provider prefixes route through OpenRouter.
 
 Estimated effort: **half a day** on top of the pre-work PR.
 
diff --git a/docs/superpowers/plans/2026-04-26-finish-agent-workflow-cli.md b/docs/superpowers/plans/2026-04-26-finish-agent-workflow-cli.md
new file mode 100644
index 00000000..23ad8eef
--- /dev/null
+++ b/docs/superpowers/plans/2026-04-26-finish-agent-workflow-cli.md
@@ -0,0 +1,100 @@
+# Finish Agent Workflow CLI Implementation Plan
+
+> **For agentic workers:** REQUIRED SUB-SKILL: Use superpowers:subagent-driven-development (recommended) or superpowers:executing-plans to implement this plan task-by-task. Steps use checkbox (`- [ ]`) syntax for tracking.
+
+**Goal:** Finish `ergon workflow` as an agent-facing CLI for task editing and resource copying in one PR off `main`.
+
+**Architecture:** Extend the already-merged V1 instead of replacing it. Keep scoped reads and mutation policy in `WorkflowService`, command parsing/rendering in `ergon_cli.commands.workflow`, and model-facing scope injection in `workflow_cli_tool`. All commands stay current-run/current-node scoped unless an injected manager-capable context explicitly permits broader graph edits.
+
+**Tech Stack:** Python, argparse, SQLModel, existing `WorkflowGraphRepository`, existing run graph tables, pydantic DTOs, pytest.
+
+---
+
+## Current Baseline
+
+Already merged:
+
+- `ergon workflow ...` top-level command.
+- `WorkflowService` with task/resource inspection and `materialize_resource`.
+- `workflow(command)` pydantic-ai wrapper with injected run/node/execution/sandbox scope.
+- ResearchRubrics workflow worker registration.
+
+## Implementation Tasks
+
+### Task 1: Real Task Editing Commands
+
+**Files:**
+- Modify: `ergon_core/ergon_core/core/runtime/services/workflow_dto.py`
+- Modify: `ergon_core/ergon_core/core/runtime/services/workflow_service.py`
+- Modify: `ergon_cli/ergon_cli/commands/workflow.py`
+- Test: `tests/unit/runtime/test_workflow_service.py`
+- Test: `tests/unit/cli/test_workflow_cli.py`
+
+- [ ] Add a `WorkflowMutationRef` DTO with `action`, `dry_run`, `node`, `edge`, `message`, and `suggested_commands`.
+- [ ] Add service methods for `add_task`, `add_edge`, `update_task_description`, `restart_task`, and `abandon_task`.
+- [ ] Use `WorkflowGraphRepository` for graph writes and mutation logging.
+- [ ] Keep `--dry-run` behavior identical to real command validation but without writes.
+- [ ] Add CLI parser arguments for task slug, description, worker, source/target, and status fields.
+- [ ] Add text and JSON renderers for mutation results.
+- [ ] Verify with focused unit tests before moving on.
+
+### Task 2: Resource Copying Completion
+
+**Files:**
+- Modify: `ergon_core/ergon_core/core/runtime/services/workflow_dto.py`
+- Modify: `ergon_core/ergon_core/core/runtime/services/workflow_service.py`
+- Modify: `ergon_cli/ergon_cli/commands/workflow.py`
+- Test: `tests/unit/runtime/test_workflow_service.py`
+- Test: `tests/unit/cli/test_workflow_cli.py`
+
+- [ ] Add `inspect resource-location`.
+- [ ] Add `inspect task-workspace`.
+- [ ] Harden `materialize-resource` destination handling: reject absolute paths, `..`, and paths outside `/workspace`.
+- [ ] Preserve source resource bytes and row unchanged.
+- [ ] Ensure copied resource rows use `RunResourceKind.IMPORT`, `copied_from_resource_id`, and metadata with source resource, source task, and sandbox destination.
+- [ ] Add JSON/text outputs for resource location and task workspace.
+- [ ] Verify with unit tests and one integration-style sandbox-manager-injected test.
+
+### Task 3: Agent Wrapper Permissions
+
+**Files:**
+- Modify: `ergon_builtins/ergon_builtins/tools/workflow_cli_tool.py`
+- Test: `tests/unit/state/test_workflow_cli_tool.py`
+
+- [ ] Add a permission mode to the wrapper: leaf agents can inspect and materialize visible resources; manager-capable agents can use graph edit commands.
+- [ ] Reject user-supplied scope/context flags before command execution.
+- [ ] Reject multiline commands.
+- [ ] Return structured, model-readable failure strings instead of leaking tracebacks.
+- [ ] Verify wrapper tests for allowed inspect, allowed materialize, denied graph edit, and allowed manager graph edit.
+
+### Task 4: Acceptance Coverage
+
+**Files:**
+- Modify: existing smoke fixture workers only as needed.
+- Modify: existing E2E assertions only as needed.
+- Test: focused unit tests plus existing smoke tests.
+
+- [ ] Ensure one deterministic no-LLM smoke path calls `workflow("inspect task-tree")`.
+- [ ] Ensure one deterministic no-LLM smoke path calls `workflow("inspect resource-list --scope input")`.
+- [ ] Ensure one deterministic no-LLM smoke path dry-runs `manage materialize-resource`.
+- [ ] Keep real-LLM rollout optional, using `researchrubrics-workflow-cli-react`.
+- [ ] Run focused workflow tests, Python unit tests touched by runtime changes, frontend contract generation if schemas change, and CI-fast-compatible checks.
+
+## Verification Commands
+
+Run incrementally:
+
+```bash
+uv run pytest tests/unit/runtime/test_workflow_service.py -v
+uv run pytest tests/unit/cli/test_workflow_cli.py -v
+uv run pytest tests/unit/state/test_workflow_cli_tool.py -v
+```
+
+Before PR:
+
+```bash
+uv run pytest tests/unit/runtime/test_workflow_service.py tests/unit/cli/test_workflow_cli.py tests/unit/state/test_workflow_cli_tool.py -v
+uv run pytest tests/unit/runtime tests/unit/cli tests/unit/state -q
+pnpm --dir ergon-dashboard run typecheck
+```
+
diff --git a/ergon_builtins/AGENTS.md b/ergon_builtins/AGENTS.md
index cfb169a3..98174ada 100644
--- a/ergon_builtins/AGENTS.md
+++ b/ergon_builtins/AGENTS.md
@@ -108,16 +108,16 @@ and is instantiated directly by `researchrubrics-researcher`; it is not in
 
 ---
 
-## Model backends (`MODEL_BACKENDS` in registry_core.py)
+## Model targets (`resolve_model_target`)
 
 | prefix | file | notes |
 |---|---|---|
-| `vllm:` | `models/vllm_backend.py` | Points at a running vLLM server; supports logprobs. |
-| `openai:`, `anthropic:`, `google:` | `models/cloud_passthrough.py` | Passes through to pydantic-ai's provider.  No logprobs. |
-| *(no prefix)* | fallthrough | Handed to pydantic-ai's `infer_model` — may pick a default or fail. |
+| `vllm:<base-url>[#<model>]` | `ergon_core/core/providers/generation/openai_compatible.py` | Points at a running vLLM server; supports logprobs. |
+| `openai-compatible:<base-url>#<model>` | `ergon_core/core/providers/generation/openai_compatible.py` | Generic OpenAI-compatible endpoints such as Ollama. |
+| `openai:`, `anthropic:`, `google:` | `ergon_core/core/providers/generation/openrouter.py` | Always routed through OpenRouter, not direct cloud APIs. |
 
 Default when `--model` is omitted: `openai:gpt-4o`
-(`ergon_core/core/providers/generation/model_resolution.py:57`).
+(`ergon_core/core/providers/generation/model_resolution.py`).
 
 ---
 
diff --git a/ergon_builtins/ergon_builtins/benchmarks/researchrubrics/benchmark.py b/ergon_builtins/ergon_builtins/benchmarks/researchrubrics/benchmark.py
index 6d343710..b9b11107 100644
--- a/ergon_builtins/ergon_builtins/benchmarks/researchrubrics/benchmark.py
+++ b/ergon_builtins/ergon_builtins/benchmarks/researchrubrics/benchmark.py
@@ -107,10 +107,11 @@ def _payload_from_row(
     row: Mapping[str, Any],  # slopcop: ignore[no-typing-any]
 ) -> ResearchRubricsTaskPayload:
     """Convert one raw HuggingFace row into the benchmark payload schema."""
+    ablated_prompt = row.get("ablated_prompt") or row["prompt"]
     return ResearchRubricsTaskPayload(
         sample_id=row["sample_id"],
         domain=str(row.get("domain", "")),
-        ablated_prompt=row["ablated_prompt"],
+        ablated_prompt=ablated_prompt,
         rubrics=[
             RubricCriterion(
                 criterion=r["criterion"],
diff --git a/ergon_builtins/ergon_builtins/models/cloud_passthrough.py b/ergon_builtins/ergon_builtins/models/cloud_passthrough.py
deleted file mode 100644
index e7620a1d..00000000
--- a/ergon_builtins/ergon_builtins/models/cloud_passthrough.py
+++ /dev/null
@@ -1,14 +0,0 @@
-"""Cloud passthrough: resolves ``openai:``, ``anthropic:``, etc. by passing through to PydanticAI."""
-
-from ergon_core.core.providers.generation.model_resolution import ResolvedModel
-
-
-def resolve_cloud(
-    target: str,
-    *,
-    model_name: str | None = None,
-    policy_version: str | None = None,
-    api_key: str | None = None,
-) -> ResolvedModel:
-    """Pass cloud model targets through to PydanticAI's infer_model."""
-    return ResolvedModel(model=target, supports_logprobs=False)
diff --git a/ergon_builtins/ergon_builtins/models/vllm_backend.py b/ergon_builtins/ergon_builtins/models/vllm_backend.py
deleted file mode 100644
index 0488f5a1..00000000
--- a/ergon_builtins/ergon_builtins/models/vllm_backend.py
+++ /dev/null
@@ -1,58 +0,0 @@
-"""vLLM backend: resolves ``vllm:http://...`` targets to OpenAI-compatible PydanticAI models."""
-
-import json
-import logging
-import urllib.error
-import urllib.request
-
-from ergon_core.core.providers.generation.model_resolution import ResolvedModel
-from pydantic_ai.models.openai import OpenAIModel as OpenAIChatModel
-from pydantic_ai.providers.openai import OpenAIProvider
-
-logger = logging.getLogger(__name__)
-
-
-def resolve_vllm(
-    target: str,
-    *,
-    model_name: str | None = None,
-    policy_version: str | None = None,
-    api_key: str | None = None,
-) -> ResolvedModel:
-    """Resolve a ``vllm:http://...`` target to a PydanticAI model."""
-    endpoint = target[5:].rstrip("/")
-    resolved_name = model_name or _discover_model_name(endpoint)
-    provider = OpenAIProvider(
-        base_url=f"{endpoint}/v1",
-        api_key=api_key or "not-needed",
-    )
-    model = OpenAIChatModel(model_name=resolved_name, provider=provider)
-    logger.info(
-        "Resolved vLLM model: endpoint=%s model_name=%s policy_version=%s",
-        endpoint,
-        resolved_name,
-        policy_version,
-    )
-    return ResolvedModel(model=model, policy_version=policy_version, supports_logprobs=True)
-
-
-def _discover_model_name(endpoint: str) -> str:
-    """Query ``/v1/models`` to discover the served model name."""
-    url = f"{endpoint}/v1/models"
-    try:
-        with urllib.request.urlopen(url, timeout=5) as resp:
-            body = json.loads(resp.read())
-        models = body.get("data", [])
-        if models:
-            name = models[0].get("id", "default")
-            logger.info("Discovered vLLM model name: %s", name)
-            return name
-    except (
-        urllib.error.HTTPError,
-        urllib.error.URLError,
-        TimeoutError,
-        OSError,
-        json.JSONDecodeError,
-    ):
-        logger.warning("Could not discover vLLM model name from %s, using 'default'", url)
-    return "default"
diff --git a/ergon_builtins/ergon_builtins/registry.py b/ergon_builtins/ergon_builtins/registry.py
index aa340e2f..f91f9ddb 100644
--- a/ergon_builtins/ergon_builtins/registry.py
+++ b/ergon_builtins/ergon_builtins/registry.py
@@ -8,10 +8,6 @@
 
 import structlog
 from ergon_core.api import Benchmark, Evaluator, Worker
-from ergon_core.core.providers.generation.model_resolution import (
-    ResolvedModel,
-    register_model_backend,
-)
 from ergon_core.core.providers.sandbox.manager import BaseSandboxManager
 
 from ergon_builtins.registry_core import (
@@ -20,9 +16,6 @@
 from ergon_builtins.registry_core import (
     EVALUATORS as _core_evaluators,
 )
-from ergon_builtins.registry_core import (
-    MODEL_BACKENDS as _core_model_backends,
-)
 from ergon_builtins.registry_core import (
     SANDBOX_MANAGERS as _core_sandbox_managers,
 )
@@ -42,19 +35,6 @@
 EVALUATORS: dict[str, type[Evaluator]] = {**_core_evaluators}
 SANDBOX_MANAGERS: dict[str, type[BaseSandboxManager]] = {**_core_sandbox_managers}
 
-_model_backends: dict[str, Callable[..., ResolvedModel]] = {**_core_model_backends}
-
-# -- Capability: local-models ----------------------------------------------
-
-try:
-    from ergon_builtins.registry_local_models import (
-        MODEL_BACKENDS as _local_model_backends,
-    )
-
-    _model_backends.update(_local_model_backends)
-except ImportError:
-    log.info("ergon-builtins[local-models] not installed; local transformers inference unavailable")
-
 # -- Capability: data ------------------------------------------------------
 
 try:
@@ -80,11 +60,6 @@
         "ergon-builtins[data] not installed; gdpeval and researchrubrics benchmarks unavailable"
     )
 
-# -- Register model backends -----------------------------------------------
-
-for prefix, resolver in _model_backends.items():
-    register_model_backend(prefix, resolver)
-
 # -- Install hints for slugs that require optional capabilities -------------
 
 INSTALL_HINTS: dict[str, str] = {
diff --git a/ergon_builtins/ergon_builtins/registry_core.py b/ergon_builtins/ergon_builtins/registry_core.py
index 7be86868..67ea3697 100644
--- a/ergon_builtins/ergon_builtins/registry_core.py
+++ b/ergon_builtins/ergon_builtins/registry_core.py
@@ -10,7 +10,6 @@
 from uuid import UUID
 
 from ergon_core.api import Benchmark, Evaluator, Worker
-from ergon_core.core.providers.generation.model_resolution import ResolvedModel
 from ergon_core.core.providers.sandbox.manager import BaseSandboxManager
 
 from ergon_builtins.benchmarks.gdpeval.rubric import StagedRubric
@@ -25,8 +24,6 @@
 )
 from ergon_builtins.benchmarks.swebench_verified.toolkit import SWEBenchToolkit
 from ergon_builtins.evaluators.rubrics.swebench_rubric import SWEBenchRubric
-from ergon_builtins.models.cloud_passthrough import resolve_cloud
-from ergon_builtins.models.vllm_backend import resolve_vllm
 from ergon_builtins.workers.baselines.react_prompts import (
     MINIF2F_SYSTEM_PROMPT,
     SWEBENCH_SYSTEM_PROMPT,
@@ -178,10 +175,3 @@ def _swebench_react(
     "minif2f": Path(__file__).parent / "benchmarks/minif2f/sandbox",
     "swebench-verified": Path(__file__).parent / "benchmarks/swebench_verified/sandbox",
 }
-
-MODEL_BACKENDS: dict[str, Callable[..., ResolvedModel]] = {
-    "vllm": resolve_vllm,
-    "openai": resolve_cloud,
-    "anthropic": resolve_cloud,
-    "google": resolve_cloud,
-}
diff --git a/ergon_builtins/ergon_builtins/registry_local_models.py b/ergon_builtins/ergon_builtins/registry_local_models.py
deleted file mode 100644
index e45abd5d..00000000
--- a/ergon_builtins/ergon_builtins/registry_local_models.py
+++ /dev/null
@@ -1,16 +0,0 @@
-"""Components that require the [local-models] capability (torch + transformers).
-
-Eager, fully-typed imports.  This module will fail to import if torch/
-transformers/outlines are not installed — that's by design.  The composition
-layer in registry.py handles the ImportError gracefully.
-"""
-
-from collections.abc import Callable
-
-from ergon_core.core.providers.generation.model_resolution import ResolvedModel
-
-from ergon_builtins.models.transformers_backend import resolve_transformers
-
-MODEL_BACKENDS: dict[str, Callable[..., ResolvedModel]] = {
-    "transformers": resolve_transformers,
-}
diff --git a/ergon_builtins/ergon_builtins/tools/workflow_cli_tool.py b/ergon_builtins/ergon_builtins/tools/workflow_cli_tool.py
index 0b540b6c..f15a0984 100644
--- a/ergon_builtins/ergon_builtins/tools/workflow_cli_tool.py
+++ b/ergon_builtins/ergon_builtins/tools/workflow_cli_tool.py
@@ -1,20 +1,29 @@
 from collections.abc import Awaitable, Callable
+import shlex
 from typing import Protocol
 from uuid import UUID
 
 from ergon_cli.commands.workflow import (
     WorkflowCommandContext,
     WorkflowCommandOutput,
-    execute_workflow_command,
+    execute_workflow_command_async,
 )
 from ergon_core.api.worker_context import WorkerContext
 from ergon_core.core.persistence.shared.db import get_session
 from ergon_core.core.runtime.services.workflow_service import WorkflowService
 from sqlmodel import Session
 
+_MANAGER_ONLY_ACTIONS = {
+    "add-task",
+    "add-edge",
+    "update-task-description",
+    "restart-task",
+    "abandon-task",
+}
+
 
 class WorkflowCommandExecutor(Protocol):
-    def __call__(
+    async def __call__(
         self,
         command: str,
         *,
@@ -29,9 +38,10 @@ def make_workflow_cli_tool(
     worker_context: WorkerContext,
     sandbox_task_key: UUID,
     benchmark_type: str,
-    execute_command: WorkflowCommandExecutor = execute_workflow_command,
+    execute_command: WorkflowCommandExecutor = execute_workflow_command_async,
     session_factory: Callable[[], Session] = get_session,
     service_factory: Callable[[], WorkflowService] = WorkflowService,
+    manager_capable: bool = False,
 ) -> Callable[[str], Awaitable[str]]:
     """Build an agent-facing ``workflow(command)`` callable.
 
@@ -44,19 +54,25 @@ async def workflow(command: str) -> str:
         """Inspect workflow topology/resources or dry-run workflow management commands."""
         if worker_context.node_id is None:
             raise ValueError("workflow tool requires WorkerContext.node_id")
+        denial = _denial_reason(command, manager_capable=manager_capable)
+        if denial is not None:
+            return f"workflow denied: {denial}"
 
-        output = execute_command(
-            command,
-            context=WorkflowCommandContext(
-                run_id=worker_context.run_id,
-                node_id=worker_context.node_id,
-                execution_id=worker_context.execution_id,
-                sandbox_task_key=sandbox_task_key,
-                benchmark_type=benchmark_type,
-            ),
-            session_factory=session_factory,
-            service=service_factory(),
-        )
+        try:
+            output = await execute_command(
+                command,
+                context=WorkflowCommandContext(
+                    run_id=worker_context.run_id,
+                    node_id=worker_context.node_id,
+                    execution_id=worker_context.execution_id,
+                    sandbox_task_key=sandbox_task_key,
+                    benchmark_type=benchmark_type,
+                ),
+                session_factory=session_factory,
+                service=service_factory(),
+            )
+        except Exception as exc:  # slopcop: ignore[no-broad-except]
+            return f"workflow failed: {type(exc).__name__}: {exc}"
         if output.exit_code != 0:
             detail = output.stderr or output.stdout
             return f"workflow exited {output.exit_code}: {detail}".strip()
@@ -65,3 +81,16 @@ async def workflow(command: str) -> str:
         return output.stdout
 
     return workflow
+
+
+def _denial_reason(command: str, *, manager_capable: bool) -> str | None:
+    if "\n" in command or "\r" in command:
+        return "multiline commands are not allowed"
+    try:
+        argv = shlex.split(command)
+    except ValueError as exc:
+        return f"could not parse command: {exc}"
+    if len(argv) >= 3 and argv[0] == "manage" and argv[1] in _MANAGER_ONLY_ACTIONS:
+        if not manager_capable:
+            return f"{argv[1]} requires a manager-capable workflow tool"
+    return None
diff --git a/ergon_builtins/ergon_builtins/workers/baselines/training_stub_worker.py b/ergon_builtins/ergon_builtins/workers/baselines/training_stub_worker.py
index 459351c8..37ec781e 100644
--- a/ergon_builtins/ergon_builtins/workers/baselines/training_stub_worker.py
+++ b/ergon_builtins/ergon_builtins/workers/baselines/training_stub_worker.py
@@ -21,11 +21,11 @@
     ModelResponsePart,
     TextPart,
     ThinkingPart,
-    TokenLogprob,
     ToolCallPart,
     ToolReturnPart,
     UserPromptPart,
 )
+from ergon_core.core.providers.generation.types import TokenLogprob
 
 
 class TrainingStubWorker(Worker):
diff --git a/ergon_builtins/ergon_builtins/workers/research_rubrics/workflow_cli_react_worker.py b/ergon_builtins/ergon_builtins/workers/research_rubrics/workflow_cli_react_worker.py
index 302a9bde..37259000 100644
--- a/ergon_builtins/ergon_builtins/workers/research_rubrics/workflow_cli_react_worker.py
+++ b/ergon_builtins/ergon_builtins/workers/research_rubrics/workflow_cli_react_worker.py
@@ -45,14 +45,39 @@
     "- workflow: Inspect current-run task topology and resources\n\n"
     "Write your final report to 'final_output/report.md' using write_report_draft. "
     "Include a # Findings section and a ## Sources section with citations.\n\n"
+    "Hard operating budget: use at most 6 exa_search calls for your own work. "
+    "After that, write the report from the evidence you have. Prefer targeted "
+    "queries over broad exploration.\n\n"
     "Use workflow(command) to inspect this run before "
     "deciding what context is missing. Useful commands include: "
-    "`inspect task-tree`, `inspect resource-list --scope input`, "
+    "`inspect task-workspace --format json`, `inspect task-tree`, "
+    "`inspect resource-list --scope input`, "
     "`inspect resource-list --scope visible --limit 20`, "
+    "`inspect resource-location --resource-id <id>`, "
     "`inspect next-actions`, and "
     "`manage materialize-resource --resource-id <id> --dry-run`. "
     "Use `--format json` when you need stable IDs. Resource copies are snapshots: "
-    "materialized files become resources owned by this task, not edits to the source."
+    "materialized files become resources owned by this task, not edits to the source.\n\n"
+    "First call `workflow(\"inspect task-workspace --format json\")`. Use only "
+    "`task_workspace.task.level` from that response to decide whether this current "
+    "task may delegate. Ignore level-0 tasks shown elsewhere in task-tree. If "
+    "`task_workspace.task.level is exactly 0`, create exactly three specialist "
+    "child tasks before researching: "
+    "(1) a source scout for finding citations, "
+    "(2) a rubric compliance checker for mapping requirements to an outline, and "
+    "(3) a synthesis reviewer for risks, gaps, and counterclaims. "
+    "Use `workflow(\"manage add-task --task-slug <short_unique_slug> --worker worker "
+    "--description '<specialist task description>'\")` for each child. "
+    "Give each child a role-specific description that includes the original task "
+    "goal and asks for a concise markdown report in `final_output/report.md`. "
+    "Then continue your own report; do not wait for child results unless visible "
+    "resources are already available.\n\n"
+    "If your current `task_workspace.task.level` is not 0, you are already a "
+    "specialist child. You must do only your assigned specialist work; do not call "
+    "`workflow(\"manage add-task` under any "
+    "circumstances. Do not inspect the workflow repeatedly. Use at most 2 "
+    "workflow inspections and at most 3 exa_search calls, then write your "
+    "specialist markdown report to `final_output/report.md`."
 )
 
 
@@ -121,6 +146,7 @@ async def publisher_sync() -> list[RunResourceView]:
             worker_context=context,
             sandbox_task_key=self.task_id,
             benchmark_type="researchrubrics",
+            manager_capable=True,
         )
         self.tools = [*rr_toolkit.build_tools(), *graph_toolkit.build_tools(), workflow_tool]
 
diff --git a/ergon_cli/ergon_cli/commands/workflow.py b/ergon_cli/ergon_cli/commands/workflow.py
index 27a32d9f..21ec9559 100644
--- a/ergon_cli/ergon_cli/commands/workflow.py
+++ b/ergon_cli/ergon_cli/commands/workflow.py
@@ -8,6 +8,7 @@
 from ergon_core.api.json_types import JsonObject
 from ergon_core.core.persistence.shared.db import get_session
 from ergon_core.core.runtime.services.workflow_service import WorkflowService
+from ergon_core.core.runtime.services.workflow_dto import WorkflowMutationRef
 from pydantic import BaseModel
 from sqlmodel import Session
 from collections.abc import Callable
@@ -59,10 +60,17 @@ def build_workflow_parser() -> argparse.ArgumentParser:
     resource_content.add_argument("--max-bytes", type=int, default=100_000)
     resource_content.add_argument("--format", choices=["text", "json"], default="text")
 
+    resource_location = inspect_sub.add_parser("resource-location")
+    resource_location.add_argument("--resource-id", required=True)
+    resource_location.add_argument("--format", choices=["text", "json"], default="text")
+
     task_tree = inspect_sub.add_parser("task-tree")
     task_tree.add_argument("--format", choices=["text", "json"], default="text")
     task_tree.add_argument("--parent-node-id", default=None)
 
+    task_workspace = inspect_sub.add_parser("task-workspace")
+    task_workspace.add_argument("--format", choices=["text", "json"], default="text")
+
     dependencies = inspect_sub.add_parser("task-dependencies")
     dependencies.add_argument(
         "--direction", choices=["upstream", "downstream", "both"], default="both"
@@ -81,8 +89,32 @@ def build_workflow_parser() -> argparse.ArgumentParser:
     materialize.add_argument("--dry-run", action="store_true")
     materialize.add_argument("--format", choices=["text", "json"], default="text")
 
-    for action in ("add-task", "add-edge", "restart-task", "abandon-task"):
+    add_task = manage_sub.add_parser("add-task")
+    add_task.add_argument("--task-slug", required=True)
+    add_task.add_argument("--description", required=True)
+    add_task.add_argument("--worker", required=True)
+    add_task.add_argument("--parent-node-id", default=None)
+    add_task.add_argument("--dry-run", action="store_true")
+    add_task.add_argument("--format", choices=["text", "json"], default="text")
+    add_task.add_argument("--reason", default=None)
+
+    add_edge = manage_sub.add_parser("add-edge")
+    add_edge.add_argument("--source-task-slug", required=True)
+    add_edge.add_argument("--target-task-slug", required=True)
+    add_edge.add_argument("--dry-run", action="store_true")
+    add_edge.add_argument("--format", choices=["text", "json"], default="text")
+    add_edge.add_argument("--reason", default=None)
+
+    update_description = manage_sub.add_parser("update-task-description")
+    update_description.add_argument("--task-slug", required=True)
+    update_description.add_argument("--description", required=True)
+    update_description.add_argument("--dry-run", action="store_true")
+    update_description.add_argument("--format", choices=["text", "json"], default="text")
+    update_description.add_argument("--reason", default=None)
+
+    for action in ("restart-task", "abandon-task"):
         parser_for_action = manage_sub.add_parser(action)
+        parser_for_action.add_argument("--task-slug", required=True)
         parser_for_action.add_argument("--dry-run", action="store_true")
         parser_for_action.add_argument("--format", choices=["text", "json"], default="text")
         parser_for_action.add_argument("--reason", default=None)
@@ -96,6 +128,23 @@ def execute_workflow_command(
     context: WorkflowCommandContext,
     session_factory: Callable[[], Session],
     service: WorkflowService,
+) -> WorkflowCommandOutput:
+    return asyncio.run(  # slopcop: ignore[no-async-from-sync] -- CLI sync bridge
+        execute_workflow_command_async(
+            command,
+            context=context,
+            session_factory=session_factory,
+            service=service,
+        )
+    )
+
+
+async def execute_workflow_command_async(
+    command: str,
+    *,
+    context: WorkflowCommandContext,
+    session_factory: Callable[[], Session],
+    service: WorkflowService,
 ) -> WorkflowCommandOutput:
     argv = shlex.split(command)
     _reject_context_flags(argv)
@@ -105,9 +154,7 @@ def execute_workflow_command(
         if args.group == "inspect":
             return _handle_inspect(args, context=context, session=session, service=service)
         if args.group == "manage":
-            return asyncio.run(  # slopcop: ignore[no-async-from-sync] -- CLI/tool sync bridge
-                _handle_manage(args, context=context, session=session, service=service)
-            )
+            return await _handle_manage(args, context=context, session=session, service=service)
     finally:
         _close_session(session)
     raise ValueError(f"unsupported workflow command group: {args.group}")
@@ -184,6 +231,22 @@ def _handle_inspect(
         if args.format == "json":
             return _format_output({"content": content.decode(errors="replace")}, [], "json")
         return WorkflowCommandOutput(stdout=content.decode(errors="replace"))
+    if args.action == "resource-location":
+        location = service.get_resource_location(
+            session,
+            run_id=context.run_id,
+            resource_id=UUID(args.resource_id),
+        )
+        return _format_output(
+            {"resource_location": _dump(location)},
+            text_lines=[
+                f"resource {location.resource.name}",
+                f"producer={location.producer_task_slug or '-'}",
+                f"local={location.local_file_path}",
+                f"default_sandbox_path={location.default_sandbox_path}",
+            ],
+            output_format=args.format,
+        )
     if args.action == "task-tree":
         parent = UUID(args.parent_node_id) if args.parent_node_id else None
         tasks = service.list_tasks(session, run_id=context.run_id, parent_node_id=parent)
@@ -195,6 +258,28 @@ def _handle_inspect(
             ],
             output_format=args.format,
         )
+    if args.action == "task-workspace":
+        workspace = service.get_task_workspace(
+            session,
+            run_id=context.run_id,
+            node_id=context.node_id,
+        )
+        lines = [
+            f"task {workspace.task.task_slug} status={workspace.task.status}",
+        ]
+        if workspace.latest_execution is not None:
+            lines.append(
+                "execution "
+                f"{workspace.latest_execution.execution_id} "
+                f"status={workspace.latest_execution.status}"
+            )
+        lines.extend(f"own: {resource.name}" for resource in workspace.own_resources)
+        lines.extend(f"input: {resource.name}" for resource in workspace.input_resources)
+        return _format_output(
+            {"task_workspace": _dump(workspace)},
+            text_lines=lines,
+            output_format=args.format,
+        )
     if args.action == "task-dependencies":
         deps = service.list_dependencies(
             session,
@@ -249,18 +334,52 @@ async def _handle_manage(
             text_lines=[f"{result.source_resource_id} -> {result.sandbox_path}"],
             output_format=args.format,
         )
-    try:
-        dry_run = args.dry_run
-    except AttributeError:
-        dry_run = False
-    if dry_run:
-        payload: JsonObject = {
-            "action": args.action,
-            "dry_run": True,
-            "message": "Graph lifecycle command validated; no changes applied.",
-        }
-        return _format_output(payload, [str(payload["message"])], args.format)
-    raise ValueError(f"{args.action} requires --dry-run in workflow CLI v1")
+    if args.action == "add-task":
+        result = await service.add_task(
+            session,
+            run_id=context.run_id,
+            parent_node_id=UUID(args.parent_node_id) if args.parent_node_id else context.node_id,
+            task_slug=args.task_slug,
+            description=args.description,
+            assigned_worker_slug=args.worker,
+            dry_run=args.dry_run,
+        )
+        return _mutation_output(result, args.format)
+    if args.action == "add-edge":
+        result = await service.add_edge(
+            session,
+            run_id=context.run_id,
+            source_task_slug=args.source_task_slug,
+            target_task_slug=args.target_task_slug,
+            dry_run=args.dry_run,
+        )
+        return _mutation_output(result, args.format)
+    if args.action == "update-task-description":
+        result = await service.update_task_description(
+            session,
+            run_id=context.run_id,
+            task_slug=args.task_slug,
+            description=args.description,
+            dry_run=args.dry_run,
+        )
+        return _mutation_output(result, args.format)
+    if args.action == "restart-task":
+        result = await service.restart_task(
+            session,
+            run_id=context.run_id,
+            task_slug=args.task_slug,
+            dry_run=args.dry_run,
+        )
+        return _mutation_output(result, args.format)
+    if args.action == "abandon-task":
+        result = await service.abandon_task(
+            session,
+            run_id=context.run_id,
+            task_slug=args.task_slug,
+            dry_run=args.dry_run,
+        )
+        return _mutation_output(result, args.format)
+    raise ValueError(f"unsupported manage action: {args.action}")
 
 
 def _format_output(
@@ -273,6 +392,11 @@ def _format_output(
     return WorkflowCommandOutput(stdout="\n".join(text_lines))
 
 
+def _mutation_output(result: WorkflowMutationRef, output_format: str) -> WorkflowCommandOutput:
+    payload: JsonObject = {"mutation": _dump(result)}
+    return _format_output(payload, [result.message], output_format)
+
+
 def _dump(value: BaseModel | JsonObject) -> JsonObject:
     if isinstance(value, BaseModel):
         return cast(JsonObject, value.model_dump(mode="json"))
diff --git a/ergon_core/ergon_core/api/__init__.py b/ergon_core/ergon_core/api/__init__.py
index fe6ad7f0..61aa0602 100644
--- a/ergon_core/ergon_core/api/__init__.py
+++ b/ergon_core/ergon_core/api/__init__.py
@@ -1,5 +1,7 @@
 """Object-first Ergon public API surface."""
 
+from typing import TYPE_CHECKING
+
 from ergon_core.api.benchmark import Benchmark
 from ergon_core.api.benchmark_deps import BenchmarkDeps
 from ergon_core.api.criterion import Criterion
@@ -10,13 +12,15 @@
 from ergon_core.api.experiment import Experiment
 from ergon_core.api.handles import ExperimentRunHandle, PersistedExperimentDefinition
 from ergon_core.api.results import CriterionResult, TaskEvaluationResult, WorkerOutput
-from ergon_core.api.run_resource import RunResourceKind, RunResourceView
 from ergon_core.api.task_types import BenchmarkTask, EmptyTaskPayload
 from ergon_core.api.types import Tool
 from ergon_core.api.worker import Worker
 from ergon_core.api.worker_context import WorkerContext
 from ergon_core.api.worker_spec import WorkerSpec
 
+if TYPE_CHECKING:
+    from ergon_core.api.run_resource import RunResourceKind, RunResourceView
+
 __all__ = [
     "Benchmark",
     "BenchmarkDeps",
@@ -44,3 +48,13 @@
     "WorkerOutput",
     "WorkerSpec",
 ]
+
+
+def __getattr__(name: str) -> object:
+    if name in {"RunResourceKind", "RunResourceView"}:
+        from ergon_core.api.run_resource import RunResourceKind, RunResourceView
+
+        globals()["RunResourceKind"] = RunResourceKind
+        globals()["RunResourceView"] = RunResourceView
+        return globals()[name]
+    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")
diff --git a/ergon_core/ergon_core/api/generation.py b/ergon_core/ergon_core/api/generation.py
index 449f24ed..c54f2618 100644
--- a/ergon_core/ergon_core/api/generation.py
+++ b/ergon_core/ergon_core/api/generation.py
@@ -15,19 +15,10 @@
 from typing import Annotated, Any, Literal
 
 from ergon_core.api.json_types import JsonObject
+from ergon_core.core.providers.generation.types import TokenLogprob
 from pydantic import BaseModel, Field
 
 
-class TokenLogprob(BaseModel):
-    """Per-token log probability from the serving backend."""
-
-    model_config = {"frozen": True}
-
-    token: str
-    logprob: float
-    top_logprobs: list[JsonObject] = Field(default_factory=list)
-
-
 # ---------------------------------------------------------------------------
 # Request parts (ModelRequest input — what went INTO the model)
 # ---------------------------------------------------------------------------
diff --git a/ergon_core/ergon_core/core/persistence/context/event_payloads.py b/ergon_core/ergon_core/core/persistence/context/event_payloads.py
index db19d6b1..b2f58bd7 100644
--- a/ergon_core/ergon_core/core/persistence/context/event_payloads.py
+++ b/ergon_core/ergon_core/core/persistence/context/event_payloads.py
@@ -7,7 +7,7 @@
 
 from typing import Annotated, Any, Literal
 
-from ergon_core.api.generation import TokenLogprob
+from ergon_core.core.providers.generation.types import TokenLogprob
 from pydantic import BaseModel, Field
 
 # Exported type alias — use everywhere event_type is stored as a string field.
@@ -49,7 +49,7 @@ class ToolCallPayload(BaseModel):
     args: dict[str, Any]  # slopcop: ignore[no-typing-any]
     turn_id: str  # links events from the same generation call
     turn_token_ids: list[int] | None = None  # None if another event in this turn holds them
-    turn_logprobs: list[TokenLogprob] | None = None  # None if another event in this turn holds them
+    turn_logprobs: list[TokenLogprob] | None = None
 
 
 class ToolResultPayload(BaseModel):
diff --git a/ergon_core/ergon_core/core/providers/generation/__init__.py b/ergon_core/ergon_core/core/providers/generation/__init__.py
index 585bef15..5a166577 100644
--- a/ergon_core/ergon_core/core/providers/generation/__init__.py
+++ b/ergon_core/ergon_core/core/providers/generation/__init__.py
@@ -2,8 +2,7 @@
 
 from ergon_core.core.providers.generation.model_resolution import (
     ResolvedModel,
-    register_model_backend,
     resolve_model_target,
 )
 
-__all__ = ["ResolvedModel", "register_model_backend", "resolve_model_target"]
+__all__ = ["ResolvedModel", "resolve_model_target"]
diff --git a/ergon_core/ergon_core/core/providers/generation/model_resolution.py b/ergon_core/ergon_core/core/providers/generation/model_resolution.py
index 93f312df..bdd59f18 100644
--- a/ergon_core/ergon_core/core/providers/generation/model_resolution.py
+++ b/ergon_core/ergon_core/core/providers/generation/model_resolution.py
@@ -1,20 +1,8 @@
-"""Prefix-based model target resolution.
-
-Dispatches ``model_target`` strings to the appropriate backend based on
-their prefix (``vllm:``, ``transformers:``, ``openai:``, etc.).
-
-Concrete backend implementations live in ``ergon_builtins.models``.
-This module owns the contract (``ResolvedModel``) and the dispatch logic.
-"""
-
-import logging
-from typing import Callable
+"""Prefix-based model target resolution."""
 
 import pydantic_ai.models
 from pydantic import BaseModel
 
-logger = logging.getLogger(__name__)
-
 
 class ResolvedModel(BaseModel):
     """A resolved model target with backend metadata.
@@ -32,16 +20,6 @@ class ResolvedModel(BaseModel):
     supports_logprobs: bool = False
 
 
-# Backend resolver registry: prefix -> callable
-# Populated by ergon_builtins.registry at import time.
-_BACKEND_REGISTRY: dict[str, Callable[..., ResolvedModel]] = {}
-
-
-def register_model_backend(prefix: str, resolver: Callable[..., ResolvedModel]) -> None:
-    """Register a model backend resolver for a given prefix."""
-    _BACKEND_REGISTRY[prefix] = resolver
-
-
 def resolve_model_target(
     model_target: str | None,
     *,
@@ -51,20 +29,42 @@ def resolve_model_target(
 ) -> ResolvedModel:
     """Resolve a ``model_target`` string to a PydanticAI-compatible model.
 
-    Dispatches by prefix to registered backends. Unrecognised prefixes
-    are passed through to PydanticAI's ``infer_model``.
+    Cloud provider targets (``openai:*``, ``anthropic:*``, ``google:*``)
+    intentionally resolve to OpenRouter-hosted models. Direct cloud provider
+    API access is not part of Ergon's model-target grammar.
     """
+
     target = model_target or "openai:gpt-4o"
+    prefix = target.split(":", 1)[0] if ":" in target else ""
+
+    if prefix == "vllm":
+        from ergon_core.core.providers.generation.openai_compatible import resolve_vllm
+
+        return resolve_vllm(
+            target, model_name=model_name, policy_version=policy_version, api_key=api_key
+        )
+
+    if prefix == "openai-compatible":
+        from ergon_core.core.providers.generation.openai_compatible import (
+            resolve_openai_compatible,
+        )
+
+        return resolve_openai_compatible(
+            target, model_name=model_name, policy_version=policy_version, api_key=api_key
+        )
+
+    if prefix in {"openai", "anthropic", "google"}:
+        from ergon_core.core.providers.generation.openrouter import resolve_cloud_via_openrouter
+
+        return resolve_cloud_via_openrouter(
+            target, model_name=model_name, policy_version=policy_version, api_key=api_key
+        )
 
-    prefix = target.split(":")[0] if ":" in target else ""
+    if prefix == "openrouter":
+        from ergon_core.core.providers.generation.openrouter import resolve_openrouter_alias
 
-    resolver = _BACKEND_REGISTRY.get(prefix)
-    if resolver is not None:
-        return resolver(
-            target,
-            model_name=model_name,
-            policy_version=policy_version,
-            api_key=api_key,
+        return resolve_openrouter_alias(
+            target, model_name=model_name, policy_version=policy_version, api_key=api_key
         )
 
-    return ResolvedModel(model=target, supports_logprobs=False)
+    raise ValueError(f"Unsupported model target: {target!r}")
diff --git a/ergon_core/ergon_core/core/providers/generation/openai_compatible.py b/ergon_core/ergon_core/core/providers/generation/openai_compatible.py
new file mode 100644
index 00000000..8a4b8727
--- /dev/null
+++ b/ergon_core/ergon_core/core/providers/generation/openai_compatible.py
@@ -0,0 +1,100 @@
+"""OpenAI-compatible endpoint resolution for local and custom model servers."""
+
+import json
+import logging
+import urllib.error
+import urllib.request
+
+from pydantic_ai.models.openai import OpenAIChatModel
+from pydantic_ai.providers.openai import OpenAIProvider
+
+from ergon_core.core.providers.generation.model_resolution import ResolvedModel
+
+logger = logging.getLogger(__name__)
+
+
+def resolve_openai_compatible(
+    target: str,
+    *,
+    model_name: str | None = None,
+    policy_version: str | None = None,
+    api_key: str | None = None,
+) -> ResolvedModel:
+    """Resolve ``openai-compatible:<base-url>#<model>`` targets."""
+
+    base_url, parsed_model_name = _split_endpoint_target(
+        target,
+        prefix="openai-compatible:",
+        require_model_name=True,
+    )
+    resolved_name = model_name or parsed_model_name
+    if resolved_name is None:
+        raise ValueError("openai-compatible target requires a model name")
+    provider = OpenAIProvider(base_url=base_url.rstrip("/"), api_key=api_key or "not-needed")
+    model = OpenAIChatModel(model_name=resolved_name, provider=provider)
+    return ResolvedModel(model=model, policy_version=policy_version, supports_logprobs=False)
+
+
+def resolve_vllm(
+    target: str,
+    *,
+    model_name: str | None = None,
+    policy_version: str | None = None,
+    api_key: str | None = None,
+) -> ResolvedModel:
+    """Resolve ``vllm:<endpoint>[#<model>]`` targets."""
+
+    endpoint, parsed_model_name = _split_endpoint_target(
+        target,
+        prefix="vllm:",
+        require_model_name=False,
+    )
+    endpoint = endpoint.rstrip("/")
+    resolved_name = model_name or parsed_model_name or _discover_model_name(endpoint)
+    provider = OpenAIProvider(base_url=f"{endpoint}/v1", api_key=api_key or "not-needed")
+    model = OpenAIChatModel(model_name=resolved_name, provider=provider)
+    logger.info(
+        "Resolved vLLM model: endpoint=%s model_name=%s policy_version=%s",
+        endpoint,
+        resolved_name,
+        policy_version,
+    )
+    return ResolvedModel(model=model, policy_version=policy_version, supports_logprobs=True)
+
+
+def _split_endpoint_target(
+    target: str,
+    *,
+    prefix: str,
+    require_model_name: bool,
+) -> tuple[str, str | None]:
+    body = target.removeprefix(prefix)
+    endpoint, separator, model_name = body.partition("#")
+    if not endpoint:
+        raise ValueError(f"{prefix}<base-url> target requires a base URL")
+    if require_model_name and not (separator and model_name):
+        raise ValueError(f"{prefix}<base-url>#<model-name> target requires a model name")
+    return endpoint, model_name or None
+
+
+def _discover_model_name(endpoint: str) -> str:
+    """Query ``/v1/models`` to discover the served model name."""
+
+    url = f"{endpoint}/v1/models"
+    try:
+        with urllib.request.urlopen(url, timeout=5) as resp:
+            body = json.loads(resp.read())
+        models = body.get("data", [])
+        if models:
+            name = models[0].get("id", "default")
+            logger.info("Discovered vLLM model name: %s", name)
+            return name
+    except (
+        urllib.error.HTTPError,
+        urllib.error.URLError,
+        TimeoutError,
+        OSError,
+        json.JSONDecodeError,
+    ):
+        logger.warning("Could not discover vLLM model name from %s, using 'default'", url)
+    return "default"
diff --git a/ergon_core/ergon_core/core/providers/generation/openrouter.py b/ergon_core/ergon_core/core/providers/generation/openrouter.py
new file mode 100644
index 00000000..34f8946f
--- /dev/null
+++ b/ergon_core/ergon_core/core/providers/generation/openrouter.py
@@ -0,0 +1,55 @@
+"""OpenRouter-hosted cloud model resolution."""
+
+from pydantic_ai.models.openai import OpenAIChatModel
+from pydantic_ai.providers.openrouter import OpenRouterProvider
+
+from ergon_core.core.providers.generation.model_resolution import ResolvedModel
+from ergon_core.core.settings import settings
+
+CLOUD_PROVIDER_PREFIXES = frozenset({"openai", "anthropic", "google"})
+
+
+def resolve_cloud_via_openrouter(
+    target: str,
+    *,
+    model_name: str | None = None,
+    policy_version: str | None = None,
+    api_key: str | None = None,
+) -> ResolvedModel:
+    """Resolve ``openai:*``, ``anthropic:*``, and ``google:*`` through OpenRouter."""
+
+    provider_prefix, separator, provider_model_name = target.partition(":")
+    if not separator or not provider_model_name:
+        raise ValueError(f"Unsupported model target: {target!r}")
+    if provider_prefix not in CLOUD_PROVIDER_PREFIXES:
+        raise ValueError(f"Unsupported cloud provider target: {target!r}")
+
+    openrouter_model_name = model_name or f"{provider_prefix}/{provider_model_name}"
+    provider = _openrouter_provider(api_key)
+    model = OpenAIChatModel(model_name=openrouter_model_name, provider=provider)
+    return ResolvedModel(model=model, policy_version=policy_version, supports_logprobs=False)
+
+
+def resolve_openrouter_alias(
+    target: str,
+    *,
+    model_name: str | None = None,
+    policy_version: str | None = None,
+    api_key: str | None = None,
+) -> ResolvedModel:
+    """Resolve legacy ``openrouter:<provider>/<model>`` targets through OpenRouter."""
+
+    provider_model_name = target.removeprefix("openrouter:")
+    if not provider_model_name:
+        raise ValueError("openrouter:<provider>/<model> target requires a model name")
+
+    provider = _openrouter_provider(api_key)
+    model = OpenAIChatModel(model_name=model_name or provider_model_name, provider=provider)
+    return ResolvedModel(model=model, policy_version=policy_version, supports_logprobs=False)
+
+
+def _openrouter_provider(api_key: str | None) -> OpenRouterProvider:
+    resolved_api_key = api_key or settings.openrouter_api_key
+    if resolved_api_key:
+        return OpenRouterProvider(api_key=resolved_api_key)
+    return OpenRouterProvider()
diff --git a/ergon_core/ergon_core/core/providers/generation/pydantic_ai_format.py b/ergon_core/ergon_core/core/providers/generation/pydantic_ai_format.py
index 243dc2b7..744440b6 100644
--- a/ergon_core/ergon_core/core/providers/generation/pydantic_ai_format.py
+++ b/ergon_core/ergon_core/core/providers/generation/pydantic_ai_format.py
@@ -15,8 +15,8 @@
 rather than re-implementing the parsing.
 """
 
-from ergon_core.api.generation import TokenLogprob
 from ergon_core.api.json_types import JsonObject
+from ergon_core.core.providers.generation.types import TokenLogprob
 
 
 def extract_logprobs(
diff --git a/ergon_core/ergon_core/core/providers/generation/types.py b/ergon_core/ergon_core/core/providers/generation/types.py
new file mode 100644
index 00000000..cf206095
--- /dev/null
+++ b/ergon_core/ergon_core/core/providers/generation/types.py
@@ -0,0 +1,17 @@
+"""Shared generation provider value types."""
+
+from pydantic import BaseModel, Field
+
+type JsonScalar = str | int | float | bool | None
+type JsonValue = JsonScalar | list[JsonValue] | dict[str, JsonValue]
+type JsonObject = dict[str, JsonValue]
+
+
+class TokenLogprob(BaseModel):
+    """Per-token log probability from the serving backend."""
+
+    model_config = {"frozen": True}
+
+    token: str
+    logprob: float
+    top_logprobs: list[JsonObject] = Field(default_factory=list)
diff --git a/ergon_core/ergon_core/core/runtime/services/workflow_dto.py b/ergon_core/ergon_core/core/runtime/services/workflow_dto.py
index d5b45aaa..33dd2ac3 100644
--- a/ergon_core/ergon_core/core/runtime/services/workflow_dto.py
+++ b/ergon_core/ergon_core/core/runtime/services/workflow_dto.py
@@ -13,6 +13,7 @@ class WorkflowTaskRef(BaseModel):
     level: int
     parent_node_id: UUID | None = None
     assigned_worker_slug: str | None = None
+    description: str | None = None
 
 
 class WorkflowExecutionRef(BaseModel):
@@ -82,3 +83,32 @@ class WorkflowMaterializedResourceRef(BaseModel):
     sandbox_path: str
     dry_run: bool = False
     source_mutated: bool = False
+
+
+class WorkflowMutationRef(BaseModel):
+    model_config = {"frozen": True}
+
+    action: str
+    dry_run: bool
+    node: WorkflowTaskRef | None = None
+    edge: WorkflowDependencyRef | None = None
+    message: str
+    suggested_commands: list[str] = Field(default_factory=list)
+
+
+class WorkflowResourceLocationRef(BaseModel):
+    model_config = {"frozen": True}
+
+    resource: WorkflowResourceRef
+    producer_task_slug: str | None = None
+    local_file_path: str
+    default_sandbox_path: str
+
+
+class WorkflowTaskWorkspaceRef(BaseModel):
+    model_config = {"frozen": True}
+
+    task: WorkflowTaskRef
+    latest_execution: WorkflowExecutionRef | None = None
+    own_resources: list[WorkflowResourceRef] = Field(default_factory=list)
+    input_resources: list[WorkflowResourceRef] = Field(default_factory=list)
diff --git a/ergon_core/ergon_core/core/runtime/services/workflow_service.py b/ergon_core/ergon_core/core/runtime/services/workflow_service.py
index a9aaff6b..cbe0a6ca 100644
--- a/ergon_core/ergon_core/core/runtime/services/workflow_service.py
+++ b/ergon_core/ergon_core/core/runtime/services/workflow_service.py
@@ -1,23 +1,33 @@
-from collections.abc import Callable
+from collections.abc import Awaitable, Callable
 from pathlib import PurePosixPath
 from typing import Literal
-from uuid import UUID
+from uuid import UUID, uuid4
 
+import inngest
 from ergon_core.core.persistence.graph.models import RunGraphEdge, RunGraphNode
 from ergon_core.core.persistence.shared.enums import TaskExecutionStatus
 from ergon_core.core.persistence.telemetry.models import (
+    RunRecord,
     RunResource,
     RunResourceKind,
     RunTaskExecution,
 )
 from ergon_core.core.providers.sandbox.manager import BaseSandboxManager, DefaultSandboxManager
+from ergon_core.core.runtime.events.task_events import TaskReadyEvent
+from ergon_core.core.runtime.inngest_client import inngest_client
+from ergon_core.core.runtime.services.graph_dto import GraphEdgeDto, GraphNodeDto, MutationMeta
+from ergon_core.core.runtime.services.graph_repository import WorkflowGraphRepository
 from ergon_core.core.runtime.services.workflow_dto import (
     WorkflowBlockerRef,
     WorkflowDependencyRef,
+    WorkflowExecutionRef,
     WorkflowMaterializedResourceRef,
+    WorkflowMutationRef,
     WorkflowNextActionRef,
+    WorkflowResourceLocationRef,
     WorkflowResourceRef,
     WorkflowTaskRef,
+    WorkflowTaskWorkspaceRef,
 )
 from sqlmodel import Session, col, select
 
@@ -37,8 +47,12 @@ def __init__(
         self,
         *,
         sandbox_manager_factory: Callable[[str], BaseSandboxManager] | None = None,
+        graph_repository: WorkflowGraphRepository | None = None,
+        task_ready_dispatcher: Callable[[UUID, UUID, UUID], Awaitable[None]] | None = None,
     ) -> None:
         self._sandbox_manager_factory = sandbox_manager_factory or self._sandbox_manager_for
+        self._graph_repo = graph_repository or WorkflowGraphRepository()
+        self._task_ready_dispatcher = task_ready_dispatcher or self._dispatch_task_ready
 
     def list_tasks(
         self,
@@ -156,6 +170,60 @@ def read_resource_bytes(
         with open(resource.file_path, "rb") as handle:
             return handle.read(max_bytes)
 
+    def get_resource_location(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        resource_id: UUID,
+    ) -> WorkflowResourceLocationRef:
+        resource = self._resource_in_run(session, run_id=run_id, resource_id=resource_id)
+        producer = self._producer_node_for_resource(session, resource)
+        copied_name = self._copy_name(resource.name)
+        default_path = self._sandbox_destination(
+            destination=None,
+            producer_slug=producer.task_slug if producer is not None else "unknown",
+            copied_name=copied_name,
+        )
+        return WorkflowResourceLocationRef(
+            resource=self._resource_ref(session, resource),
+            producer_task_slug=producer.task_slug if producer is not None else None,
+            local_file_path=resource.file_path,
+            default_sandbox_path=default_path,
+        )
+
+    def get_task_workspace(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        node_id: UUID,
+    ) -> WorkflowTaskWorkspaceRef:
+        node = self._resolve_node(session, run_id=run_id, node_id=node_id, task_slug=None)
+        latest = self.get_latest_execution(session, node_id=node_id)
+        own_resources: list[WorkflowResourceRef] = []
+        if latest is not None:
+            own_rows = list(
+                session.exec(
+                    select(RunResource)
+                    .where(RunResource.run_id == run_id)
+                    .where(RunResource.task_execution_id == latest.id),
+                ).all(),
+            )
+            own_rows.sort(key=lambda resource: (resource.created_at, resource.id), reverse=True)
+            own_resources = [self._resource_ref(session, resource) for resource in own_rows]
+        return WorkflowTaskWorkspaceRef(
+            task=self._task_ref(node),
+            latest_execution=self._execution_ref(latest) if latest is not None else None,
+            own_resources=own_resources,
+            input_resources=self.list_resources(
+                session,
+                run_id=run_id,
+                node_id=node_id,
+                scope="input",
+            ),
+        )
+
     def get_task_blockers(
         self,
         session: Session,
@@ -203,6 +271,215 @@ def get_next_actions(
             )
         ]
 
+    async def add_task(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        parent_node_id: UUID,
+        task_slug: str,
+        description: str,
+        assigned_worker_slug: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        parent = self._resolve_node(
+            session,
+            run_id=run_id,
+            node_id=parent_node_id,
+            task_slug=None,
+        )
+        node_ref = WorkflowTaskRef(
+            node_id=uuid4(),
+            task_slug=task_slug,
+            status=TaskExecutionStatus.PENDING.value,
+            level=parent.level + 1,
+            parent_node_id=parent.id,
+            assigned_worker_slug=assigned_worker_slug,
+            description=description,
+        )
+        if dry_run:
+            return WorkflowMutationRef(
+                action="add-task",
+                dry_run=True,
+                node=node_ref,
+                message=f"Would add task {task_slug}",
+            )
+
+        created = await self._graph_repo.add_node(
+            session,
+            run_id,
+            task_slug=task_slug,
+            instance_key=parent.instance_key,
+            description=description,
+            status=TaskExecutionStatus.PENDING.value,
+            assigned_worker_slug=assigned_worker_slug,
+            parent_node_id=parent.id,
+            level=parent.level + 1,
+            meta=self._meta("add-task"),
+        )
+        session.commit()
+        definition_id = self._resolve_definition_id(session, run_id)
+        await self._task_ready_dispatcher(run_id, definition_id, created.id)
+        return WorkflowMutationRef(
+            action="add-task",
+            dry_run=False,
+            node=self._task_ref_from_graph(created),
+            message=f"Added task {task_slug}",
+        )
+
+    async def add_edge(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        source_task_slug: str,
+        target_task_slug: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        source = self._resolve_node(
+            session,
+            run_id=run_id,
+            node_id=None,
+            task_slug=source_task_slug,
+        )
+        target = self._resolve_node(
+            session,
+            run_id=run_id,
+            node_id=None,
+            task_slug=target_task_slug,
+        )
+        edge_ref = WorkflowDependencyRef(
+            edge_id=uuid4(),
+            edge_status="pending",
+            source=self._task_ref(source),
+            target=self._task_ref(target),
+        )
+        if dry_run:
+            return WorkflowMutationRef(
+                action="add-edge",
+                dry_run=True,
+                edge=edge_ref,
+                message=f"Would add dependency {source_task_slug} -> {target_task_slug}",
+            )
+
+        created = await self._graph_repo.add_edge(
+            session,
+            run_id,
+            source_node_id=source.id,
+            target_node_id=target.id,
+            status="pending",
+            meta=self._meta("add-edge"),
+        )
+        session.commit()
+        return WorkflowMutationRef(
+            action="add-edge",
+            dry_run=False,
+            edge=self._dependency_ref_from_graph(session, run_id, created),
+            message=f"Added dependency {source_task_slug} -> {target_task_slug}",
+        )
+
+    async def update_task_description(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        task_slug: str,
+        description: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        node = self._resolve_node(session, run_id=run_id, node_id=None, task_slug=task_slug)
+        if dry_run:
+            return WorkflowMutationRef(
+                action="update-task-description",
+                dry_run=True,
+                node=self._task_ref(node).model_copy(update={"description": description}),
+                message=f"Would update description for {task_slug}",
+            )
+
+        updated = await self._graph_repo.update_node_field(
+            session,
+            run_id=run_id,
+            node_id=node.id,
+            field="description",
+            value=description,
+            meta=self._meta("update-task-description"),
+        )
+        session.commit()
+        return WorkflowMutationRef(
+            action="update-task-description",
+            dry_run=False,
+            node=self._task_ref_from_graph(updated),
+            message=f"Updated description for {task_slug}",
+        )
+
+    async def restart_task(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        task_slug: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        return await self._set_task_status(
+            session,
+            run_id=run_id,
+            task_slug=task_slug,
+            action="restart-task",
+            status=TaskExecutionStatus.PENDING.value,
+            dry_run=dry_run,
+        )
+
+    async def abandon_task(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        task_slug: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        return await self._set_task_status(
+            session,
+            run_id=run_id,
+            task_slug=task_slug,
+            action="abandon-task",
+            status=TaskExecutionStatus.CANCELLED.value,
+            dry_run=dry_run,
+        )
+
+    async def _set_task_status(
+        self,
+        session: Session,
+        *,
+        run_id: UUID,
+        task_slug: str,
+        action: str,
+        status: str,
+        dry_run: bool,
+    ) -> WorkflowMutationRef:
+        node = self._resolve_node(session, run_id=run_id, node_id=None, task_slug=task_slug)
+        if dry_run:
+            return WorkflowMutationRef(
+                action=action,
+                dry_run=True,
+                node=self._task_ref(node).model_copy(update={"status": status}),
+                message=f"Would set {task_slug} to {status}",
+            )
+        await self._graph_repo.update_node_status(
+            session,
+            run_id=run_id,
+            node_id=node.id,
+            new_status=status,
+            meta=self._meta(action),
+        )
+        session.commit()
+        refreshed = self._resolve_node(session, run_id=run_id, node_id=None, task_slug=task_slug)
+        return WorkflowMutationRef(
+            action=action,
+            dry_run=False,
+            node=self._task_ref(refreshed),
+            message=f"Set {task_slug} to {status}",
+        )
+
     async def materialize_resource(  # slopcop: ignore[max-function-params] -- mirrors CLI scope fields
         self,
         session: Session,
@@ -278,6 +555,57 @@ def _task_ref(node: RunGraphNode) -> WorkflowTaskRef:
             level=node.level,
             parent_node_id=node.parent_node_id,
             assigned_worker_slug=node.assigned_worker_slug,
+            description=node.description,
+        )
+
+    @staticmethod
+    def _task_ref_from_graph(node: GraphNodeDto) -> WorkflowTaskRef:
+        return WorkflowTaskRef(
+            node_id=node.id,
+            task_slug=node.task_slug,
+            status=node.status,
+            level=node.level,
+            parent_node_id=node.parent_node_id,
+            assigned_worker_slug=node.assigned_worker_slug,
+            description=node.description,
+        )
+
+    def _dependency_ref_from_graph(
+        self,
+        session: Session,
+        run_id: UUID,
+        edge: GraphEdgeDto,
+    ) -> WorkflowDependencyRef:
+        nodes = self._nodes_by_id(session, run_id)
+        return WorkflowDependencyRef(
+            edge_id=edge.id,
+            edge_status=edge.status,
+            source=self._task_ref(nodes[edge.source_node_id]),
+            target=self._task_ref(nodes[edge.target_node_id]),
+        )
+
+    @staticmethod
+    def _meta(action: str) -> MutationMeta:
+        return MutationMeta(actor="workflow-cli", reason=action)
+
+    def _resolve_definition_id(self, session: Session, run_id: UUID) -> UUID:
+        run = session.get(RunRecord, run_id)
+        if run is None:
+            raise ValueError(f"run {run_id} not found")
+        return run.experiment_definition_id
+
+    async def _dispatch_task_ready(self, run_id: UUID, definition_id: UUID, node_id: UUID) -> None:
+        event = TaskReadyEvent(
+            run_id=run_id,
+            definition_id=definition_id,
+            task_id=None,
+            node_id=node_id,
+        )
+        await inngest_client.send(
+            inngest.Event(
+                name=TaskReadyEvent.name,
+                data=event.model_dump(mode="json"),
+            )
         )
 
     def _resource_ref(self, session: Session, resource: RunResource) -> WorkflowResourceRef:
@@ -298,6 +626,15 @@ def _resource_ref(self, session: Session, resource: RunResource) -> WorkflowReso
             created_at=resource.created_at,
         )
 
+    @staticmethod
+    def _execution_ref(execution: RunTaskExecution) -> WorkflowExecutionRef:
+        return WorkflowExecutionRef(
+            execution_id=execution.id,
+            status=execution.status,
+            attempt_number=execution.attempt_number,
+            final_assistant_message=execution.final_assistant_message,
+        )
+
     @staticmethod
     def _resource_in_run(session: Session, *, run_id: UUID, resource_id: UUID) -> RunResource:
         resource = session.get(RunResource, resource_id)
diff --git a/ergon_core/pyproject.toml b/ergon_core/pyproject.toml
index 19b4002a..107b52fd 100644
--- a/ergon_core/pyproject.toml
+++ b/ergon_core/pyproject.toml
@@ -15,7 +15,7 @@ dependencies = [
     "uvicorn>=0.24.0",
     "e2b-code-interpreter",
     "openai",
-    "pydantic-ai",
+    "pydantic-ai>=0.8.1",
     "litellm",
     "opentelemetry-api",
     "opentelemetry-sdk",
diff --git a/tests/real_llm/benchmarks/test_researchrubrics.py b/tests/real_llm/benchmarks/test_researchrubrics.py
index f8a842b1..c535853a 100644
--- a/tests/real_llm/benchmarks/test_researchrubrics.py
+++ b/tests/real_llm/benchmarks/test_researchrubrics.py
@@ -37,6 +37,7 @@
     RunRecord,
     RunResource,
     RunTaskEvaluation,
+    RunTaskExecution,
 )
 from ergon_core.core.providers.generation.openrouter_budget import OpenRouterBudget
 from ergon_core.core.settings import settings
@@ -52,9 +53,9 @@
 
 pytestmark = [pytest.mark.real_llm, pytest.mark.asyncio]
 
-# Default to Sonnet 4.6 via OpenRouter.  Override with ERGON_REAL_LLM_MODEL
-# to roll out against a different model without editing the test.
-_DEFAULT_MODEL = "openrouter:anthropic/claude-sonnet-4.6"
+# Cloud provider prefixes resolve through OpenRouter. Override with
+# ERGON_REAL_LLM_MODEL to roll out against a different model.
+_DEFAULT_MODEL = "anthropic:claude-sonnet-4.6"
 
 # Wall-clock caps.  Real-LLM + real-sandbox rollouts are slow; keep
 # these generous enough to absorb E2B startup + Exa retries but bounded
@@ -99,6 +100,7 @@ def _wait_for_post_terminal_artifacts(run_id: UUID) -> None:
     deadline = time.monotonic() + _POST_TERMINAL_ARTIFACT_TIMEOUT_SECONDS
     while time.monotonic() < deadline:
         with get_session() as session:
+            run = session.get(RunRecord, run_id)
             resources = len(
                 list(session.exec(select(RunResource).where(RunResource.run_id == run_id)).all())
             )
@@ -109,8 +111,22 @@ def _wait_for_post_terminal_artifacts(run_id: UUID) -> None:
                     ).all()
                 )
             )
+            executions = list(
+                session.exec(select(RunTaskExecution).where(RunTaskExecution.run_id == run_id)).all()
+            )
         if resources > 0 and evaluations > 0:
             return
+        run_status = str(getattr(run.status, "value", run.status)).lower() if run else ""
+        running_executions = {
+            "pending",
+            "running",
+            "executing",
+        }
+        if run_status in {"failed", "cancelled"} and not any(
+            str(getattr(execution.status, "value", execution.status)).lower() in running_executions
+            for execution in executions
+        ):
+            return
         time.sleep(2)
 
 
@@ -128,7 +144,7 @@ async def test_researchrubrics_rollout(
     state inside the time budget.
     """
     model = os.environ.get("ERGON_REAL_LLM_MODEL", _DEFAULT_MODEL)
-    benchmark = "researchrubrics"
+    benchmark = os.environ.get("ERGON_REAL_LLM_BENCHMARK", "researchrubrics")
     worker = os.environ.get("ERGON_REAL_LLM_WORKER", "researchrubrics-researcher")
     evaluator = "research-rubric"
     limit = os.environ.get("ERGON_REAL_LLM_LIMIT", "1")
diff --git a/tests/unit/cli/test_workflow_cli.py b/tests/unit/cli/test_workflow_cli.py
index 4c587413..1c6cf652 100644
--- a/tests/unit/cli/test_workflow_cli.py
+++ b/tests/unit/cli/test_workflow_cli.py
@@ -1,11 +1,18 @@
 import json
-from dataclasses import dataclass
 from datetime import UTC, datetime
 from uuid import uuid4
 
 import pytest
 from ergon_cli.commands.workflow import WorkflowCommandContext, execute_workflow_command
-from ergon_core.core.runtime.services.workflow_dto import WorkflowResourceRef
+from ergon_core.core.runtime.services.workflow_dto import (
+    WorkflowExecutionRef,
+    WorkflowMutationRef,
+    WorkflowResourceLocationRef,
+    WorkflowResourceRef,
+    WorkflowTaskRef,
+    WorkflowTaskWorkspaceRef,
+)
+from pydantic import BaseModel
 
 
 class _Session:
@@ -13,12 +20,14 @@ def close(self) -> None:
         pass
 
 
-@dataclass
-class _Service:
-    resource: WorkflowResourceRef
+class _Service(BaseModel):
+    model_config = {"arbitrary_types_allowed": True}
+
+    resource: WorkflowResourceRef | None
 
     def list_resources(self, session, *, run_id, node_id, scope, kind=None, max_depth=3, limit=50):
         assert isinstance(session, _Session)
+        assert self.resource is not None
         assert run_id == self.resource.run_id
         assert node_id == self.resource.node_id
         assert scope == "visible"
@@ -56,7 +65,7 @@ def test_resource_list_json_uses_injected_context() -> None:
             benchmark_type="researchrubrics",
         ),
         session_factory=_Session,
-        service=_Service(resource),
+        service=_Service(resource=resource),
     )
 
     payload = json.loads(output.stdout)
@@ -66,6 +75,188 @@ def test_resource_list_json_uses_injected_context() -> None:
     assert payload["resources"][0]["task_slug"] == "research"
 
 
+def test_manage_add_task_json_plumbs_cli_arguments_to_service() -> None:
+    expected_run_id = uuid4()
+    expected_parent_node_id = uuid4()
+    created_node_id = uuid4()
+
+    class Service:
+        async def add_task(
+            self,
+            session,
+            *,
+            run_id,
+            parent_node_id,
+            task_slug,
+            description,
+            assigned_worker_slug,
+            dry_run,
+        ):
+            assert isinstance(session, _Session)
+            assert run_id == expected_run_id
+            assert parent_node_id == expected_parent_node_id
+            assert task_slug == "new_leaf"
+            assert description == "New leaf"
+            assert assigned_worker_slug == "researchrubrics-researcher"
+            assert dry_run is True
+            return WorkflowMutationRef(
+                action="add-task",
+                dry_run=True,
+                node=WorkflowTaskRef(
+                    node_id=created_node_id,
+                    task_slug="new_leaf",
+                    status="pending",
+                    level=2,
+                    parent_node_id=expected_parent_node_id,
+                    assigned_worker_slug="researchrubrics-researcher",
+                    description="New leaf",
+                ),
+                message="Would add task new_leaf",
+            )
+
+    output = execute_workflow_command(
+        "manage add-task --task-slug new_leaf --description 'New leaf' "
+        "--worker researchrubrics-researcher "
+        f"--parent-node-id {expected_parent_node_id} --dry-run --format json",
+        context=WorkflowCommandContext(
+            run_id=expected_run_id,
+            node_id=expected_parent_node_id,
+            execution_id=uuid4(),
+            sandbox_task_key=uuid4(),
+            benchmark_type="researchrubrics",
+        ),
+        session_factory=_Session,
+        service=Service(),
+    )
+
+    payload = json.loads(output.stdout)
+    assert payload["mutation"]["action"] == "add-task"
+    assert payload["mutation"]["node"]["task_slug"] == "new_leaf"
+    assert payload["mutation"]["dry_run"] is True
+
+
+def test_resource_location_json_uses_injected_run_scope() -> None:
+    run_id = uuid4()
+    node_id = uuid4()
+    resource_id = uuid4()
+    resource = WorkflowResourceRef(
+        resource_id=resource_id,
+        run_id=run_id,
+        task_execution_id=uuid4(),
+        node_id=node_id,
+        task_slug="producer",
+        kind="report",
+        name="paper.txt",
+        mime_type="text/plain",
+        size_bytes=12,
+        file_path="/tmp/paper.txt",
+        content_hash="sha256:abc",
+        copied_from_resource_id=None,
+        created_at=datetime(2026, 4, 26, tzinfo=UTC),
+    )
+
+    class Service:
+        def get_resource_location(self, session, *, run_id, resource_id):
+            assert isinstance(session, _Session)
+            assert run_id == resource.run_id
+            assert resource_id == resource.resource_id
+            return WorkflowResourceLocationRef(
+                resource=resource,
+                producer_task_slug="producer",
+                local_file_path="/tmp/paper.txt",
+                default_sandbox_path="/workspace/imported/producer/paper (copy).txt",
+            )
+
+    output = execute_workflow_command(
+        f"inspect resource-location --resource-id {resource_id} --format json",
+        context=WorkflowCommandContext(
+            run_id=run_id,
+            node_id=node_id,
+            execution_id=uuid4(),
+            sandbox_task_key=uuid4(),
+            benchmark_type="researchrubrics",
+        ),
+        session_factory=_Session,
+        service=Service(),
+    )
+
+    payload = json.loads(output.stdout)
+    assert payload["resource_location"]["producer_task_slug"] == "producer"
+    assert payload["resource_location"]["default_sandbox_path"].startswith("/workspace/imported")
+
+
+def test_task_workspace_text_lists_own_and_input_resources() -> None:
+    run_id = uuid4()
+    node_id = uuid4()
+    execution_id = uuid4()
+
+    class Service:
+        def get_task_workspace(self, session, *, run_id, node_id):
+            assert isinstance(session, _Session)
+            return WorkflowTaskWorkspaceRef(
+                task=WorkflowTaskRef(
+                    node_id=node_id,
+                    task_slug="current",
+                    status="running",
+                    level=1,
+                    description="Current",
+                ),
+                latest_execution=WorkflowExecutionRef(
+                    execution_id=execution_id,
+                    status="running",
+                    attempt_number=1,
+                    final_assistant_message=None,
+                ),
+                own_resources=[
+                    WorkflowResourceRef(
+                        resource_id=uuid4(),
+                        run_id=run_id,
+                        task_execution_id=execution_id,
+                        node_id=node_id,
+                        task_slug="current",
+                        kind="report",
+                        name="own.txt",
+                        mime_type="text/plain",
+                        size_bytes=3,
+                        file_path="/tmp/own.txt",
+                        created_at=datetime(2026, 4, 26, tzinfo=UTC),
+                    )
+                ],
+                input_resources=[
+                    WorkflowResourceRef(
+                        resource_id=uuid4(),
+                        run_id=run_id,
+                        task_execution_id=uuid4(),
+                        node_id=uuid4(),
+                        task_slug="upstream",
+                        kind="report",
+                        name="input.txt",
+                        mime_type="text/plain",
+                        size_bytes=5,
+                        file_path="/tmp/input.txt",
+                        created_at=datetime(2026, 4, 26, tzinfo=UTC),
+                    )
+                ],
+            )
+
+    output = execute_workflow_command(
+        "inspect task-workspace",
+        context=WorkflowCommandContext(
+            run_id=run_id,
+            node_id=node_id,
+            execution_id=execution_id,
+            sandbox_task_key=uuid4(),
+            benchmark_type="researchrubrics",
+        ),
+        session_factory=_Session,
+        service=Service(),
+    )
+
+    assert "task current status=running" in output.stdout
+    assert "own: own.txt" in output.stdout
+    assert "input: input.txt" in output.stdout
+
+
 def test_agent_command_rejects_user_supplied_context_flags() -> None:
     with pytest.raises(ValueError, match="scope/context flags are injected"):
         execute_workflow_command(
diff --git a/tests/unit/providers/test_model_resolution.py b/tests/unit/providers/test_model_resolution.py
new file mode 100644
index 00000000..6b17227f
--- /dev/null
+++ b/tests/unit/providers/test_model_resolution.py
@@ -0,0 +1,47 @@
+import pytest
+
+from ergon_core.core.providers.generation.model_resolution import resolve_model_target
+
+
+def test_cloud_provider_targets_resolve_to_openrouter_provider() -> None:
+    from pydantic_ai.models.openai import OpenAIChatModel
+    from pydantic_ai.providers.openrouter import OpenRouterProvider
+
+    resolved = resolve_model_target("openai:gpt-4o", api_key="test-openrouter-key")
+
+    assert isinstance(resolved.model, OpenAIChatModel)
+    assert isinstance(resolved.model._provider, OpenRouterProvider)
+    assert resolved.model.model_name == "openai/gpt-4o"
+    assert resolved.model.system == "openrouter"
+    assert resolved.supports_logprobs is False
+
+
+def test_anthropic_target_resolves_to_openrouter_namespace() -> None:
+    from pydantic_ai.models.openai import OpenAIChatModel
+
+    resolved = resolve_model_target(
+        "anthropic:claude-sonnet-4.6", api_key="test-openrouter-key"
+    )
+
+    assert isinstance(resolved.model, OpenAIChatModel)
+    assert resolved.model.model_name == "anthropic/claude-sonnet-4.6"
+
+
+def test_vllm_endpoint_target_resolves_to_openai_compatible_model() -> None:
+    from pydantic_ai.models.openai import OpenAIChatModel
+
+    resolved = resolve_model_target("vllm:http://localhost:8000#served-model")
+
+    assert isinstance(resolved.model, OpenAIChatModel)
+    assert resolved.model.model_name == "served-model"
+    assert resolved.supports_logprobs is True
+
+
+def test_openai_compatible_target_requires_model_name() -> None:
+    with pytest.raises(ValueError, match="model name"):
+        resolve_model_target("openai-compatible:http://localhost:11434/v1")
+
+
+def test_unknown_model_target_prefix_is_rejected() -> None:
+    with pytest.raises(ValueError, match="Unsupported model target"):
+        resolve_model_target("mystery:model")
diff --git a/tests/unit/runtime/test_import_boundaries.py b/tests/unit/runtime/test_import_boundaries.py
new file mode 100644
index 00000000..edf3245b
--- /dev/null
+++ b/tests/unit/runtime/test_import_boundaries.py
@@ -0,0 +1,25 @@
+def test_telemetry_models_import_before_run_resource_api() -> None:
+    from ergon_core.core.persistence.telemetry.models import RunResource
+
+    from ergon_core.api.run_resource import RunResourceView
+
+    assert RunResource.__tablename__ == "run_resources"
+    assert RunResourceView.__name__ == "RunResourceView"
+
+
+def test_context_models_import_without_worker_cycle() -> None:
+    from ergon_core.core.persistence.context.models import RunContextEvent
+
+    assert RunContextEvent.__tablename__ == "run_context_events"
+
+
+def test_context_event_payloads_use_shared_logprob_type_without_api_cycle() -> None:
+    from typing import get_args
+
+    from ergon_core.core.persistence.context.event_payloads import ToolCallPayload
+    from ergon_core.core.providers.generation.types import TokenLogprob
+
+    annotation_args = get_args(ToolCallPayload.model_fields["turn_logprobs"].annotation)
+    list_annotation = next(arg for arg in annotation_args if get_args(arg))
+
+    assert get_args(list_annotation) == (TokenLogprob,)
diff --git a/tests/unit/runtime/test_workflow_service.py b/tests/unit/runtime/test_workflow_service.py
index 69e5a73a..c5d25bed 100644
--- a/tests/unit/runtime/test_workflow_service.py
+++ b/tests/unit/runtime/test_workflow_service.py
@@ -31,6 +31,7 @@ def _node(
     *,
     run_id: UUID,
     slug: str,
+    description: str | None = None,
     status: str = "completed",
     parent_node_id: UUID | None = None,
     level: int = 0,
@@ -39,7 +40,7 @@ def _node(
         run_id=run_id,
         instance_key="instance",
         task_slug=slug,
-        description=f"Task {slug}",
+        description=description or f"Task {slug}",
         status=status,
         assigned_worker_slug="worker",
         parent_node_id=parent_node_id,
@@ -47,6 +48,15 @@ def _node(
     )
 
 
+def _edge(*, run_id: UUID, source_node_id: UUID, target_node_id: UUID) -> RunGraphEdge:
+    return RunGraphEdge(
+        run_id=run_id,
+        source_node_id=source_node_id,
+        target_node_id=target_node_id,
+        status="satisfied",
+    )
+
+
 def _execution(
     *,
     run_id: UUID,
@@ -301,3 +311,271 @@ async def test_materialize_resource_dry_run_keeps_copy_name_for_explicit_destina
 
     assert result.sandbox_path == "/workspace/selected/paper (copy).pdf"
     assert result.copied_resource_id is None
+
+
+def test_resource_location_describes_producer_and_workspace_destination(tmp_path: Path) -> None:
+    session = _session()
+    run_id = _run(session)
+    producer = _node(run_id=run_id, slug="producer")
+    session.add(producer)
+    session.flush()
+    producer_exec = _execution(run_id=run_id, node_id=producer.id)
+    session.add(producer_exec)
+    session.flush()
+    source = _resource(
+        run_id=run_id,
+        execution_id=producer_exec.id,
+        name="paper.pdf",
+        path=tmp_path / "paper.pdf",
+        content=b"paper",
+    )
+    session.add(source)
+    session.commit()
+
+    location = WorkflowService().get_resource_location(
+        session,
+        run_id=run_id,
+        resource_id=source.id,
+    )
+
+    assert location.resource.resource_id == source.id
+    assert location.producer_task_slug == "producer"
+    assert location.default_sandbox_path == "/workspace/imported/producer/paper (copy).pdf"
+    assert location.local_file_path == source.file_path
+
+
+def test_task_workspace_reports_latest_execution_and_resources(tmp_path: Path) -> None:
+    session = _session()
+    run_id = _run(session)
+    current = _node(run_id=run_id, slug="current", status="running")
+    upstream = _node(run_id=run_id, slug="upstream")
+    session.add_all([current, upstream])
+    session.flush()
+    current_exec = _execution(
+        run_id=run_id,
+        node_id=current.id,
+        status=TaskExecutionStatus.RUNNING,
+    )
+    upstream_exec = _execution(run_id=run_id, node_id=upstream.id)
+    session.add_all([current_exec, upstream_exec])
+    session.flush()
+    session.add(_edge(run_id=run_id, source_node_id=upstream.id, target_node_id=current.id))
+    session.add_all(
+        [
+            _resource(
+                run_id=run_id,
+                execution_id=current_exec.id,
+                name="own.txt",
+                path=tmp_path / "own.txt",
+                content=b"own",
+            ),
+            _resource(
+                run_id=run_id,
+                execution_id=upstream_exec.id,
+                name="input.txt",
+                path=tmp_path / "input.txt",
+                content=b"input",
+            ),
+        ]
+    )
+    session.commit()
+
+    workspace = WorkflowService().get_task_workspace(
+        session,
+        run_id=run_id,
+        node_id=current.id,
+    )
+
+    assert workspace.task.task_slug == "current"
+    assert workspace.latest_execution is not None
+    assert workspace.latest_execution.execution_id == current_exec.id
+    assert [resource.name for resource in workspace.own_resources] == ["own.txt"]
+    assert [resource.name for resource in workspace.input_resources] == ["input.txt"]
+
+
+@pytest.mark.asyncio
+async def test_materialize_resource_rejects_parent_directory_destination(
+    tmp_path: Path,
+) -> None:
+    session = _session()
+    run_id = _run(session)
+    producer = _node(run_id=run_id, slug="producer")
+    consumer = _node(run_id=run_id, slug="consumer")
+    session.add_all([producer, consumer])
+    session.flush()
+    producer_exec = _execution(run_id=run_id, node_id=producer.id)
+    consumer_exec = _execution(
+        run_id=run_id,
+        node_id=consumer.id,
+        status=TaskExecutionStatus.RUNNING,
+    )
+    session.add_all([producer_exec, consumer_exec])
+    session.flush()
+    source = _resource(
+        run_id=run_id,
+        execution_id=producer_exec.id,
+        name="paper.pdf",
+        path=tmp_path / "paper.pdf",
+        content=b"paper",
+    )
+    session.add(source)
+    session.commit()
+
+    with pytest.raises(ValueError, match="destination must stay inside /workspace"):
+        await WorkflowService().materialize_resource(
+            session,
+            run_id=run_id,
+            current_node_id=consumer.id,
+            current_execution_id=consumer_exec.id,
+            sandbox_task_key=consumer.id,
+            benchmark_type="test",
+            resource_id=source.id,
+            destination="../escape/paper.pdf",
+            dry_run=True,
+        )
+
+
+@pytest.mark.asyncio
+async def test_add_task_dry_run_does_not_write_node() -> None:
+    session = _session()
+    run_id = _run(session)
+    parent = _node(run_id=run_id, slug="parent", level=1)
+    session.add(parent)
+    session.commit()
+
+    result = await WorkflowService().add_task(
+        session,
+        run_id=run_id,
+        parent_node_id=parent.id,
+        task_slug="child",
+        description="Child task",
+        assigned_worker_slug="worker",
+        dry_run=True,
+    )
+
+    nodes = session.exec(select(RunGraphNode).where(RunGraphNode.run_id == run_id)).all()
+    assert len(nodes) == 1
+    assert result.action == "add-task"
+    assert result.dry_run is True
+    assert result.node is not None
+    assert result.node.task_slug == "child"
+    assert result.node.parent_node_id == parent.id
+    assert result.node.level == 2
+
+
+@pytest.mark.asyncio
+async def test_add_task_writes_node_and_mutation() -> None:
+    session = _session()
+    run_id = _run(session)
+    parent = _node(run_id=run_id, slug="parent", level=1)
+    session.add(parent)
+    session.commit()
+    dispatched = []
+
+    async def dispatch_task_ready(run_id, definition_id, node_id):
+        dispatched.append((run_id, definition_id, node_id))
+
+    result = await WorkflowService(task_ready_dispatcher=dispatch_task_ready).add_task(
+        session,
+        run_id=run_id,
+        parent_node_id=parent.id,
+        task_slug="child",
+        description="Child task",
+        assigned_worker_slug="worker",
+        dry_run=False,
+    )
+
+    assert result.dry_run is False
+    assert result.node is not None
+    child = session.get(RunGraphNode, result.node.node_id)
+    assert child is not None
+    assert child.task_slug == "child"
+    assert child.description == "Child task"
+    assert child.parent_node_id == parent.id
+    assert child.level == 2
+    assert child.status == TaskExecutionStatus.PENDING.value
+    run = session.get(RunRecord, run_id)
+    assert run is not None
+    assert dispatched == [(run_id, run.experiment_definition_id, child.id)]
+
+
+@pytest.mark.asyncio
+async def test_add_edge_writes_dependency_between_slugs() -> None:
+    session = _session()
+    run_id = _run(session)
+    source = _node(run_id=run_id, slug="source")
+    target = _node(run_id=run_id, slug="target")
+    session.add_all([source, target])
+    session.commit()
+
+    result = await WorkflowService().add_edge(
+        session,
+        run_id=run_id,
+        source_task_slug="source",
+        target_task_slug="target",
+        dry_run=False,
+    )
+
+    assert result.action == "add-edge"
+    assert result.edge is not None
+    edge = session.get(RunGraphEdge, result.edge.edge_id)
+    assert edge is not None
+    assert edge.source_node_id == source.id
+    assert edge.target_node_id == target.id
+    assert edge.status == "pending"
+
+
+@pytest.mark.asyncio
+async def test_update_task_description_changes_only_description() -> None:
+    session = _session()
+    run_id = _run(session)
+    node = _node(run_id=run_id, slug="target", description="Old")
+    session.add(node)
+    session.commit()
+
+    result = await WorkflowService().update_task_description(
+        session,
+        run_id=run_id,
+        task_slug="target",
+        description="New description",
+        dry_run=False,
+    )
+
+    refreshed = session.get(RunGraphNode, node.id)
+    assert refreshed is not None
+    assert refreshed.description == "New description"
+    assert refreshed.task_slug == "target"
+    assert result.node is not None
+    assert result.node.description == "New description"
+
+
+@pytest.mark.asyncio
+async def test_restart_and_abandon_task_update_node_status() -> None:
+    session = _session()
+    run_id = _run(session)
+    failed = _node(run_id=run_id, slug="failed", status="failed")
+    running = _node(run_id=run_id, slug="running", status="running")
+    session.add_all([failed, running])
+    session.commit()
+
+    restarted = await WorkflowService().restart_task(
+        session,
+        run_id=run_id,
+        task_slug="failed",
+        dry_run=False,
+    )
+    abandoned = await WorkflowService().abandon_task(
+        session,
+        run_id=run_id,
+        task_slug="running",
+        dry_run=False,
+    )
+
+    failed_row = session.get(RunGraphNode, failed.id)
+    running_row = session.get(RunGraphNode, running.id)
+    assert failed_row is not None
+    assert running_row is not None
+    assert failed_row.status == TaskExecutionStatus.PENDING.value
+    assert running_row.status == TaskExecutionStatus.CANCELLED.value
+    assert restarted.action == "restart-task"
+    assert abandoned.action == "abandon-task"
diff --git a/tests/unit/state/test_research_rubrics_benchmark.py b/tests/unit/state/test_research_rubrics_benchmark.py
index d56502c3..83ec4933 100644
--- a/tests/unit/state/test_research_rubrics_benchmark.py
+++ b/tests/unit/state/test_research_rubrics_benchmark.py
@@ -82,6 +82,40 @@ def __getitem__(self, idx):
             )
         ]
 
+    def test_load_rows_accepts_vanilla_prompt_field(self, monkeypatch: pytest.MonkeyPatch):
+        class FakeTrainDataset:
+            def __len__(self):
+                return 1
+
+            def __getitem__(self, idx):
+                assert idx == 0
+                return {
+                    "sample_id": "vanilla-sample",
+                    "domain": "planning",
+                    "prompt": "Plan a day in Washington DC.",
+                    "rubrics": [
+                        {"criterion": "Includes a timed itinerary.", "axis": "quality", "weight": 5.0},
+                    ],
+                }
+
+        monkeypatch.setattr(
+            "ergon_builtins.benchmarks.researchrubrics.benchmark.load_dataset",
+            lambda *args, **kwargs: {"train": FakeTrainDataset()},
+        )
+
+        rows = ResearchRubricsBenchmark(dataset_name="ScaleAI/researchrubrics")._load_rows()
+
+        assert rows == [
+            ResearchRubricsTaskPayload(
+                sample_id="vanilla-sample",
+                domain="planning",
+                ablated_prompt="Plan a day in Washington DC.",
+                rubrics=[
+                    {"criterion": "Includes a timed itinerary.", "axis": "quality", "weight": 5.0}
+                ],
+            )
+        ]
+
 
 class TestResearchRubricsRubric:
     """Verify task-payload-driven rubric construction."""
diff --git a/tests/unit/state/test_research_rubrics_workers.py b/tests/unit/state/test_research_rubrics_workers.py
index 65e31f7d..be78d1c6 100644
--- a/tests/unit/state/test_research_rubrics_workers.py
+++ b/tests/unit/state/test_research_rubrics_workers.py
@@ -14,6 +14,7 @@
     ResearchRubricsResearcherWorker,
 )
 from ergon_builtins.workers.research_rubrics.workflow_cli_react_worker import (
+    _WORKFLOW_PROMPT,
     ResearchRubricsWorkflowCliReActWorker,
 )
 from ergon_builtins.benchmarks.researchrubrics.toolkit_types import (
@@ -157,6 +158,12 @@ async def test_workflow_cli_worker_adds_workflow_tool(self):
         assert worker.type_slug == "researchrubrics-workflow-cli-react"
         assert "workflow" in tool_names
 
+    def test_workflow_cli_prompt_uses_current_task_level_for_delegation(self):
+        assert "inspect task-workspace --format json" in _WORKFLOW_PROMPT
+        assert "task_workspace.task.level is exactly 0" in _WORKFLOW_PROMPT
+        assert "Ignore level-0 tasks shown elsewhere in task-tree" in _WORKFLOW_PROMPT
+        assert "do not call `workflow(\"manage add-task" in _WORKFLOW_PROMPT
+
     @pytest.mark.asyncio
     async def test_report_write_uses_manager_public_file_api(self):
         task_id = uuid4()
diff --git a/tests/unit/state/test_workflow_cli_tool.py b/tests/unit/state/test_workflow_cli_tool.py
index a52f7a6b..85883e76 100644
--- a/tests/unit/state/test_workflow_cli_tool.py
+++ b/tests/unit/state/test_workflow_cli_tool.py
@@ -7,16 +7,17 @@
 
 @pytest.mark.asyncio
 async def test_workflow_tool_injects_worker_context() -> None:
+    task_key = uuid4()
     context = WorkerContext(
         run_id=uuid4(),
-        task_id=uuid4(),
+        task_id=task_key,
         execution_id=uuid4(),
         sandbox_id="sandbox",
         node_id=uuid4(),
     )
     seen = {}
 
-    def execute(command, *, context, session_factory, service):
+    async def execute(command, *, context, session_factory, service):
         seen["command"] = command
         seen["context"] = context
 
@@ -29,7 +30,7 @@ class Output:
 
     workflow = make_workflow_cli_tool(
         worker_context=context,
-        sandbox_task_key=context.task_id,
+        sandbox_task_key=task_key,
         benchmark_type="researchrubrics",
         execute_command=execute,
     )
@@ -39,21 +40,22 @@ class Output:
     assert seen["context"].run_id == context.run_id
     assert seen["context"].node_id == context.node_id
     assert seen["context"].execution_id == context.execution_id
-    assert seen["context"].sandbox_task_key == context.task_id
+    assert seen["context"].sandbox_task_key == task_key
     assert seen["context"].benchmark_type == "researchrubrics"
 
 
 @pytest.mark.asyncio
 async def test_workflow_tool_reports_nonzero_exit() -> None:
+    task_key = uuid4()
     context = WorkerContext(
         run_id=uuid4(),
-        task_id=uuid4(),
+        task_id=task_key,
         execution_id=uuid4(),
         sandbox_id="sandbox",
         node_id=uuid4(),
     )
 
-    def execute(command, *, context, session_factory, service):
+    async def execute(command, *, context, session_factory, service):
         class Output:
             stdout = ""
             stderr = "bad command"
@@ -63,9 +65,97 @@ class Output:
 
     workflow = make_workflow_cli_tool(
         worker_context=context,
-        sandbox_task_key=context.task_id,
+        sandbox_task_key=task_key,
         benchmark_type="researchrubrics",
         execute_command=execute,
     )
 
     assert await workflow("inspect nope") == "workflow exited 2: bad command"
+
+
+@pytest.mark.asyncio
+async def test_leaf_workflow_tool_rejects_graph_edit_commands() -> None:
+    task_key = uuid4()
+    context = WorkerContext(
+        run_id=uuid4(),
+        task_id=task_key,
+        execution_id=uuid4(),
+        sandbox_id="sandbox",
+        node_id=uuid4(),
+    )
+
+    async def execute(command, *, context, session_factory, service):
+        raise AssertionError("denied commands must not reach executor")
+
+    workflow = make_workflow_cli_tool(
+        worker_context=context,
+        sandbox_task_key=task_key,
+        benchmark_type="researchrubrics",
+        execute_command=execute,
+    )
+
+    result = await workflow("manage add-task --task-slug child --description Child --worker worker")
+
+    assert result.startswith("workflow denied:")
+    assert "manager-capable" in result
+
+
+@pytest.mark.asyncio
+async def test_manager_workflow_tool_allows_graph_edit_commands() -> None:
+    task_key = uuid4()
+    context = WorkerContext(
+        run_id=uuid4(),
+        task_id=task_key,
+        execution_id=uuid4(),
+        sandbox_id="sandbox",
+        node_id=uuid4(),
+    )
+    seen = {}
+
+    async def execute(command, *, context, session_factory, service):
+        seen["command"] = command
+
+        class Output:
+            stdout = "ok"
+            stderr = ""
+            exit_code = 0
+
+        return Output()
+
+    workflow = make_workflow_cli_tool(
+        worker_context=context,
+        sandbox_task_key=task_key,
+        benchmark_type="researchrubrics",
+        execute_command=execute,
+        manager_capable=True,
+    )
+
+    assert await workflow("manage restart-task --task-slug child --dry-run") == "ok"
+    assert seen["command"] == "manage restart-task --task-slug child --dry-run"
+
+
+@pytest.mark.asyncio
+async def test_workflow_tool_rejects_multiline_commands() -> None:
+    task_key = uuid4()
+    context = WorkerContext(
+        run_id=uuid4(),
+        task_id=task_key,
+        execution_id=uuid4(),
+        sandbox_id="sandbox",
+        node_id=uuid4(),
+    )
+
+    async def execute(command, *, context, session_factory, service):
+        raise AssertionError("multiline commands must not reach executor")
+
+    workflow = make_workflow_cli_tool(
+        worker_context=context,
+        sandbox_task_key=task_key,
+        benchmark_type="researchrubrics",
+        execute_command=execute,
+        manager_capable=True,
+    )
+
+    assert await workflow("inspect task-tree\ninspect next-actions") == (
+        "workflow denied: multiline commands are not allowed"
+    )
diff --git a/uv.lock b/uv.lock
index d7bfcd87..9f248d29 100644
--- a/uv.lock
+++ b/uv.lock
@@ -1128,7 +1128,7 @@ requires-dist = [
     { name = "outlines", marker = "extra == 'dev'" },
     { name = "psycopg2-binary", specifier = ">=2.9.9" },
     { name = "pydantic", specifier = ">=2.5.0" },
-    { name = "pydantic-ai" },
+    { name = "pydantic-ai", specifier = ">=0.8.1" },
     { name = "pydantic-settings", specifier = ">=2.1.0" },
     { name = "sqlmodel", specifier = ">=0.0.14" },
     { name = "structlog", specifier = ">=23.2.0" },
@@ -1489,6 +1489,19 @@ http = [
     { name = "aiohttp" },
 ]
 
+[[package]]
+name = "genai-prices"
+version = "0.0.57"
+source = { registry = "https://pypi.org/simple" }
+dependencies = [
+    { name = "httpx" },
+    { name = "pydantic" },
+]
+sdist = { url = "https://files.pythonhosted.org/packages/be/30/11f3d683cf3b1d9612475ad8bfffe3423ce9f50fc617733109033e73a038/genai_prices-0.0.57.tar.gz", hash = "sha256:6e101e9c53975557ceffa237b0995787d81fe75aac12410f2898504188bcad89", size = 66555, upload-time = "2026-04-21T13:42:52.554Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/a9/fe/d0095040c120d97cb63d055224ecd4e913dc5655315c203c8e83bf13aa86/genai_prices-0.0.57-py3-none-any.whl", hash = "sha256:14e50fb69cdc5a06ddb2a6df5a7fe06741b9e44304ce3f1728f56abdf1856cca", size = 69654, upload-time = "2026-04-21T13:42:51.236Z" },
+]
+
 [[package]]
 name = "genson"
 version = "1.3.0"
@@ -2635,14 +2648,14 @@ wheels = [
 
 [[package]]
 name = "nexus-rpc"
-version = "1.4.0"
+version = "1.1.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/35/d5/cd1ffb202b76ebc1b33c1332a3416e55a39929006982adc2b1eb069aaa9b/nexus_rpc-1.4.0.tar.gz", hash = "sha256:3b8b373d4865671789cc43623e3dc0bcbf192562e40e13727e17f1c149050fba", size = 82367, upload-time = "2026-02-25T22:01:34.053Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/ef/66/540687556bd28cf1ec370cc6881456203dfddb9dab047b8979c6865b5984/nexus_rpc-1.1.0.tar.gz", hash = "sha256:d65ad6a2f54f14e53ebe39ee30555eaeb894102437125733fb13034a04a44553", size = 77383, upload-time = "2025-07-07T19:03:58.368Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/11/52/6327a5f4fda01207205038a106a99848a41c83e933cd23ea2cab3d2ebc6c/nexus_rpc-1.4.0-py3-none-any.whl", hash = "sha256:14c953d3519113f8ccec533a9efdb6b10c28afef75d11cdd6d422640c40b3a49", size = 29645, upload-time = "2026-02-25T22:01:33.122Z" },
+    { url = "https://files.pythonhosted.org/packages/bf/2f/9e9d0dcaa4c6ffa22b7aa31069a8a264c753ff8027b36af602cce038c92f/nexus_rpc-1.1.0-py3-none-any.whl", hash = "sha256:d1b007af2aba186a27e736f8eaae39c03aed05b488084ff6c3d1785c9ba2ad38", size = 27743, upload-time = "2025-07-07T19:03:57.556Z" },
 ]
 
 [[package]]
@@ -3312,17 +3325,16 @@ wheels = [
 
 [[package]]
 name = "protobuf"
-version = "6.33.6"
+version = "5.29.6"
 source = { registry = "https://pypi.org/simple" }
-sdist = { url = "https://files.pythonhosted.org/packages/66/70/e908e9c5e52ef7c3a6c7902c9dfbb34c7e29c25d2f81ade3856445fd5c94/protobuf-6.33.6.tar.gz", hash = "sha256:a6768d25248312c297558af96a9f9c929e8c4cee0659cb07e780731095f38135", size = 444531, upload-time = "2026-03-18T19:05:00.988Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/7e/57/394a763c103e0edf87f0938dafcd918d53b4c011dfc5c8ae80f3b0452dbb/protobuf-5.29.6.tar.gz", hash = "sha256:da9ee6a5424b6b30fd5e45c5ea663aef540ca95f9ad99d1e887e819cdf9b8723", size = 425623, upload-time = "2026-02-04T22:54:40.584Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/fc/9f/2f509339e89cfa6f6a4c4ff50438db9ca488dec341f7e454adad60150b00/protobuf-6.33.6-cp310-abi3-win32.whl", hash = "sha256:7d29d9b65f8afef196f8334e80d6bc1d5d4adedb449971fefd3723824e6e77d3", size = 425739, upload-time = "2026-03-18T19:04:48.373Z" },
-    { url = "https://files.pythonhosted.org/packages/76/5d/683efcd4798e0030c1bab27374fd13a89f7c2515fb1f3123efdfaa5eab57/protobuf-6.33.6-cp310-abi3-win_amd64.whl", hash = "sha256:0cd27b587afca21b7cfa59a74dcbd48a50f0a6400cfb59391340ad729d91d326", size = 437089, upload-time = "2026-03-18T19:04:50.381Z" },
-    { url = "https://files.pythonhosted.org/packages/5c/01/a3c3ed5cd186f39e7880f8303cc51385a198a81469d53d0fdecf1f64d929/protobuf-6.33.6-cp39-abi3-macosx_10_9_universal2.whl", hash = "sha256:9720e6961b251bde64edfdab7d500725a2af5280f3f4c87e57c0208376aa8c3a", size = 427737, upload-time = "2026-03-18T19:04:51.866Z" },
-    { url = "https://files.pythonhosted.org/packages/ee/90/b3c01fdec7d2f627b3a6884243ba328c1217ed2d978def5c12dc50d328a3/protobuf-6.33.6-cp39-abi3-manylinux2014_aarch64.whl", hash = "sha256:e2afbae9b8e1825e3529f88d514754e094278bb95eadc0e199751cdd9a2e82a2", size = 324610, upload-time = "2026-03-18T19:04:53.096Z" },
-    { url = "https://files.pythonhosted.org/packages/9b/ca/25afc144934014700c52e05103c2421997482d561f3101ff352e1292fb81/protobuf-6.33.6-cp39-abi3-manylinux2014_s390x.whl", hash = "sha256:c96c37eec15086b79762ed265d59ab204dabc53056e3443e702d2681f4b39ce3", size = 339381, upload-time = "2026-03-18T19:04:54.616Z" },
-    { url = "https://files.pythonhosted.org/packages/16/92/d1e32e3e0d894fe00b15ce28ad4944ab692713f2e7f0a99787405e43533a/protobuf-6.33.6-cp39-abi3-manylinux2014_x86_64.whl", hash = "sha256:e9db7e292e0ab79dd108d7f1a94fe31601ce1ee3f7b79e0692043423020b0593", size = 323436, upload-time = "2026-03-18T19:04:55.768Z" },
-    { url = "https://files.pythonhosted.org/packages/c4/72/02445137af02769918a93807b2b7890047c32bfb9f90371cbc12688819eb/protobuf-6.33.6-py3-none-any.whl", hash = "sha256:77179e006c476e69bf8e8ce866640091ec42e1beb80b213c3900006ecfba6901", size = 170656, upload-time = "2026-03-18T19:04:59.826Z" },
+    { url = "https://files.pythonhosted.org/packages/d4/88/9ee58ff7863c479d6f8346686d4636dd4c415b0cbeed7a6a7d0617639c2a/protobuf-5.29.6-cp310-abi3-win32.whl", hash = "sha256:62e8a3114992c7c647bce37dcc93647575fc52d50e48de30c6fcb28a6a291eb1", size = 423357, upload-time = "2026-02-04T22:54:25.805Z" },
+    { url = "https://files.pythonhosted.org/packages/1c/66/2dc736a4d576847134fb6d80bd995c569b13cdc7b815d669050bf0ce2d2c/protobuf-5.29.6-cp310-abi3-win_amd64.whl", hash = "sha256:7e6ad413275be172f67fdee0f43484b6de5a904cc1c3ea9804cb6fe2ff366eda", size = 435175, upload-time = "2026-02-04T22:54:28.592Z" },
+    { url = "https://files.pythonhosted.org/packages/06/db/49b05966fd208ae3f44dcd33837b6243b4915c57561d730a43f881f24dea/protobuf-5.29.6-cp38-abi3-macosx_10_9_universal2.whl", hash = "sha256:b5a169e664b4057183a34bdc424540e86eea47560f3c123a0d64de4e137f9269", size = 418619, upload-time = "2026-02-04T22:54:30.266Z" },
+    { url = "https://files.pythonhosted.org/packages/b7/d7/48cbf6b0c3c39761e47a99cb483405f0fde2be22cf00d71ef316ce52b458/protobuf-5.29.6-cp38-abi3-manylinux2014_aarch64.whl", hash = "sha256:a8866b2cff111f0f863c1b3b9e7572dc7eaea23a7fae27f6fc613304046483e6", size = 320284, upload-time = "2026-02-04T22:54:31.782Z" },
+    { url = "https://files.pythonhosted.org/packages/e3/dd/cadd6ec43069247d91f6345fa7a0d2858bef6af366dbd7ba8f05d2c77d3b/protobuf-5.29.6-cp38-abi3-manylinux2014_x86_64.whl", hash = "sha256:e3387f44798ac1106af0233c04fb8abf543772ff241169946f698b3a9a3d3ab9", size = 320478, upload-time = "2026-02-04T22:54:32.909Z" },
+    { url = "https://files.pythonhosted.org/packages/5a/cb/e3065b447186cb70aa65acc70c86baf482d82bf75625bf5a2c4f6919c6a3/protobuf-5.29.6-py3-none-any.whl", hash = "sha256:6b9edb641441b2da9fa8f428760fc136a49cf97a52076010cf22a2ff73438a86", size = 173126, upload-time = "2026-02-04T22:54:39.462Z" },
 ]
 
 [[package]]
@@ -3605,22 +3617,23 @@ email = [
 
 [[package]]
 name = "pydantic-ai"
-version = "0.7.2"
+version = "0.8.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "pydantic-ai-slim", extra = ["ag-ui", "anthropic", "bedrock", "cli", "cohere", "evals", "google", "groq", "huggingface", "mcp", "mistral", "openai", "retries", "temporal", "vertexai"] },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/6f/d0/ca0dbea87aa677192fa4b663532bd37ae8273e883c55b661b786dbb52731/pydantic_ai-0.7.2.tar.gz", hash = "sha256:d215c323741d47ff13c6b48aa75aedfb8b6b5f9da553af709675c3078a4be4fc", size = 43763306, upload-time = "2025-08-14T22:59:58.912Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/56/d7/fcc18ce80008e888404a3615f973aa3f39b98384d61b03621144c9f4c2d4/pydantic_ai-0.8.1.tar.gz", hash = "sha256:05974382082ee4f3706909d06bdfcc5e95f39e29230cc4d00e47429080099844", size = 43772581, upload-time = "2025-08-29T14:46:23.201Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/a3/77/402a278b9694cdfaeb5bf0ed4e0fee447de624aa67126ddcce8d98dc6062/pydantic_ai-0.7.2-py3-none-any.whl", hash = "sha256:a6e5d0994aa87385a05fdfdad7fda1fd14576f623635e4000883c4c7856eba13", size = 10188, upload-time = "2025-08-14T22:59:50.653Z" },
+    { url = "https://files.pythonhosted.org/packages/f9/04/802b8cf834dffcda8baabb3b76c549243694a83346c3f54e47a3a4d519fb/pydantic_ai-0.8.1-py3-none-any.whl", hash = "sha256:5fa923097132aa69b4d6a310b462dc091009c7b87705edf4443d37b887d5ef9a", size = 10188, upload-time = "2025-08-29T14:46:11.137Z" },
 ]
 
 [[package]]
 name = "pydantic-ai-slim"
-version = "0.7.2"
+version = "0.8.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "eval-type-backport" },
+    { name = "genai-prices" },
     { name = "griffe" },
     { name = "httpx" },
     { name = "opentelemetry-api" },
@@ -3628,9 +3641,9 @@ dependencies = [
     { name = "pydantic-graph" },
     { name = "typing-inspection" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/96/39/87500c5e038296fe1becf62ac24f7e62dd5a1fb7fe63a9e29c58a2898b1a/pydantic_ai_slim-0.7.2.tar.gz", hash = "sha256:636ca32c8928048ba1173963aab6b7eb33b71174bbc371ad3f2096fee4c48dfe", size = 211787, upload-time = "2025-08-14T23:00:02.67Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/a2/91/08137459b3745900501b3bd11852ced6c81b7ce6e628696d75b09bb786c5/pydantic_ai_slim-0.8.1.tar.gz", hash = "sha256:12ef3dcbe5e1dad195d5e256746ef960f6e59aeddda1a55bdd553ee375ff53ae", size = 218906, upload-time = "2025-08-29T14:46:27.517Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/ea/93/fc3723a7cde4a8edb2d060fb8abeba22270ae61984796ab653fdd05baca0/pydantic_ai_slim-0.7.2-py3-none-any.whl", hash = "sha256:f5749d63bf4c2deac45371874df30d1d76a1572ce9467f6505926ecb835da583", size = 289755, upload-time = "2025-08-14T22:59:53.346Z" },
+    { url = "https://files.pythonhosted.org/packages/11/ce/8dbadd04f578d02a9825a46e931005743fe223736296f30b55846c084fab/pydantic_ai_slim-0.8.1-py3-none-any.whl", hash = "sha256:fc7edc141b21fe42bc54a2d92c1127f8a75160c5e57a168dba154d3f4adb963f", size = 297821, upload-time = "2025-08-29T14:46:14.647Z" },
 ]
 
 [package.optional-dependencies]
@@ -3647,6 +3660,7 @@ bedrock = [
 cli = [
     { name = "argcomplete" },
     { name = "prompt-toolkit" },
+    { name = "pyperclip" },
     { name = "rich" },
 ]
 cohere = [
@@ -3739,7 +3753,7 @@ wheels = [
 
 [[package]]
 name = "pydantic-evals"
-version = "0.7.2"
+version = "0.8.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "anyio" },
@@ -3749,9 +3763,9 @@ dependencies = [
     { name = "pyyaml" },
     { name = "rich" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/32/b7/005b1b23b96abf2bce880a4c10496c00f8ebd67690f6888e576269059f54/pydantic_evals-0.7.2.tar.gz", hash = "sha256:0cf7adee67b8a12ea0b41e5162c7256ae0f6a237acb1eea161a74ed6cf61615a", size = 44086, upload-time = "2025-08-14T23:00:03.606Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/6c/9d/460a1f2c9f5f263e9d8e9661acbd654ccc81ad3373ea43048d914091a817/pydantic_evals-0.8.1.tar.gz", hash = "sha256:c398a623c31c19ce70e346ad75654fcb1517c3f6a821461f64fe5cbbe0813023", size = 43933, upload-time = "2025-08-29T14:46:28.903Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/7c/6f/3b844991fc1223f9c3b201f222397b0d115e236389bd90ced406ebc478ea/pydantic_evals-0.7.2-py3-none-any.whl", hash = "sha256:c7497d89659c35fbcaefbeb6f457ae09d62e36e161c4b25a462808178b7cfa92", size = 52753, upload-time = "2025-08-14T22:59:55.018Z" },
+    { url = "https://files.pythonhosted.org/packages/6f/f9/1d21c4687167c4fa76fd3b1ed47f9bc2d38fd94cbacd9aa3f19e82e59830/pydantic_evals-0.8.1-py3-none-any.whl", hash = "sha256:6c76333b1d79632f619eb58a24ac656e9f402c47c75ad750ba0230d7f5514344", size = 52602, upload-time = "2025-08-29T14:46:16.602Z" },
 ]
 
 [[package]]
@@ -3774,7 +3788,7 @@ pycountry = [
 
 [[package]]
 name = "pydantic-graph"
-version = "0.7.2"
+version = "0.8.1"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "httpx" },
@@ -3782,9 +3796,9 @@ dependencies = [
     { name = "pydantic" },
     { name = "typing-inspection" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/cf/a9/8a918b4dc2cd55775d854e076823fa9b60a390e4fbec5283916346556754/pydantic_graph-0.7.2.tar.gz", hash = "sha256:f90e4ec6f02b899bf6f88cc026dafa119ea5041ab4c62ba81497717c003a946e", size = 21804, upload-time = "2025-08-14T23:00:04.834Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/bd/97/b35b7cb82d9f1bb6d5c6d21bba54f6196a3a5f593373f3a9c163a3821fd7/pydantic_graph-0.8.1.tar.gz", hash = "sha256:c61675a05c74f661d4ff38d04b74bd652c1e0959467801986f2f85dc7585410d", size = 21675, upload-time = "2025-08-29T14:46:29.839Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/12/d7/639c69dda9e4b4cf376c9f45e5eae96721f2dc2f2dc618fb63142876dce4/pydantic_graph-0.7.2-py3-none-any.whl", hash = "sha256:b6189500a465ce1bce4bbc65ac5871149af8e0f81a15d54540d3dfc0cc9b2502", size = 27392, upload-time = "2025-08-14T22:59:56.564Z" },
+    { url = "https://files.pythonhosted.org/packages/3d/e3/5908643b049bb2384d143885725cbeb0f53707d418357d4d1ac8d2c82629/pydantic_graph-0.8.1-py3-none-any.whl", hash = "sha256:f1dd5db0fe22f4e3323c04c65e2f0013846decc312b3efc3196666764556b765", size = 27239, upload-time = "2025-08-29T14:46:18.317Z" },
 ]
 
 [[package]]
@@ -3871,6 +3885,15 @@ wheels = [
     { url = "https://files.pythonhosted.org/packages/29/7d/5945b5af29534641820d3bd7b00962abbbdfee84ec7e19f0d5b3175f9a31/pynacl-1.6.2-cp38-abi3-win_arm64.whl", hash = "sha256:834a43af110f743a754448463e8fd61259cd4ab5bbedcf70f9dabad1d28a394c", size = 184801, upload-time = "2026-01-01T17:32:36.309Z" },
 ]
 
+[[package]]
+name = "pyperclip"
+version = "1.11.0"
+source = { registry = "https://pypi.org/simple" }
+sdist = { url = "https://files.pythonhosted.org/packages/e8/52/d87eba7cb129b81563019d1679026e7a112ef76855d6159d24754dbd2a51/pyperclip-1.11.0.tar.gz", hash = "sha256:244035963e4428530d9e3a6101a1ef97209c6825edab1567beac148ccc1db1b6", size = 12185, upload-time = "2025-09-26T14:40:37.245Z" }
+wheels = [
+    { url = "https://files.pythonhosted.org/packages/df/80/fc9d01d5ed37ba4c42ca2b55b4339ae6e200b456be3a1aaddf4a9fa99b8c/pyperclip-1.11.0-py3-none-any.whl", hash = "sha256:299403e9ff44581cb9ba2ffeed69c7aa96a008622ad0c46cb575ca75b5b84273", size = 11063, upload-time = "2025-09-26T14:40:36.069Z" },
+]
+
 [[package]]
 name = "pytest"
 version = "9.0.3"
@@ -4935,7 +4958,7 @@ wheels = [
 
 [[package]]
 name = "temporalio"
-version = "1.25.0"
+version = "1.16.0"
 source = { registry = "https://pypi.org/simple" }
 dependencies = [
     { name = "nexus-rpc" },
@@ -4943,13 +4966,12 @@ dependencies = [
     { name = "types-protobuf" },
     { name = "typing-extensions" },
 ]
-sdist = { url = "https://files.pythonhosted.org/packages/de/9c/3782bab0bf11a40b550147c19a5d1a476c17405391751982408902d9f138/temporalio-1.25.0.tar.gz", hash = "sha256:a3bbec1dcc904f674402cfa4faae480fda490b1c53ea5440c1f1996c562016fb", size = 2152534, upload-time = "2026-04-08T18:53:55.388Z" }
+sdist = { url = "https://files.pythonhosted.org/packages/f3/32/375ab75d0ebb468cf9c8abbc450a03d3a8c66401fc320b338bd8c00d36b4/temporalio-1.16.0.tar.gz", hash = "sha256:dd926f3e30626fd4edf5e0ce596b75ecb5bbe0e4a0281e545ac91b5577967c91", size = 1733873, upload-time = "2025-08-21T22:12:50.879Z" }
 wheels = [
-    { url = "https://files.pythonhosted.org/packages/19/e3/5676dd10d1164b6d6ca8752314054097b89c5da931e936af402a7b15236c/temporalio-1.25.0-cp310-abi3-macosx_10_12_x86_64.whl", hash = "sha256:6dc1bc8e1773b1a833d86a7ede2dd90ef4e031ced5b748b59e7f09a5bf9b327d", size = 13943906, upload-time = "2026-04-08T18:53:30.022Z" },
-    { url = "https://files.pythonhosted.org/packages/89/50/7cbf7f845973be986ec165348f72f7a409750842a04d554965a39be5cb4f/temporalio-1.25.0-cp310-abi3-macosx_11_0_arm64.whl", hash = "sha256:3c8fdcf79ea5ae8ae2cf6f48072e4a86c3e0f4778f6a8a066c6ff1d336587db4", size = 13298719, upload-time = "2026-04-08T18:53:35.95Z" },
-    { url = "https://files.pythonhosted.org/packages/d2/31/d474bab8535552add6ed289911bf1ffae5d7071823ece1069842190fcaed/temporalio-1.25.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:141f37aaafd7d090ba5c8776e4e9bc60df1fbc64b9f50c8f00e905a436588ddc", size = 13555435, upload-time = "2026-04-08T18:53:41.36Z" },
-    { url = "https://files.pythonhosted.org/packages/2a/c8/e7dc053d6107bf2a037a3c9fe7b86639a25dcb888bde0e1ca366901ee47f/temporalio-1.25.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:ff7ca5bb80264976477d4dc7a839b3d22af8577ae92306526a061481db49bf92", size = 14052050, upload-time = "2026-04-08T18:53:46.44Z" },
-    { url = "https://files.pythonhosted.org/packages/08/70/9340ed3a578321cbc153041d34834bb1ec3f1f3e3d9cded47cd1b7c3e403/temporalio-1.25.0-cp310-abi3-win_amd64.whl", hash = "sha256:9411534279a2e64847231b6059c214bff4d57cfd1532bd09f333d0b1603daa7f", size = 14299684, upload-time = "2026-04-08T18:53:52.482Z" },
+    { url = "https://files.pythonhosted.org/packages/e0/36/12bb7234c83ddca4b8b032c8f1a9e07a03067c6ed6d2ddb39c770a4c87c6/temporalio-1.16.0-cp39-abi3-macosx_11_0_arm64.whl", hash = "sha256:547c0853310350d3e5b5b9c806246cbf2feb523f685b05bf14ec1b0ece8a7bb6", size = 12540769, upload-time = "2025-08-21T22:11:24.551Z" },
+    { url = "https://files.pythonhosted.org/packages/3c/16/a7d402435b8f994979abfeffd3f5ffcaaeada467ac16438e61c51c9f7abe/temporalio-1.16.0-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl", hash = "sha256:5b05bb0d06025645aed6f936615311a6774eb8dc66280f32a810aac2283e1258", size = 12968631, upload-time = "2025-08-21T22:11:48.375Z" },
+    { url = "https://files.pythonhosted.org/packages/11/6f/16663eef877b61faa5fd917b3a63497416ec4319195af75f6169a1594479/temporalio-1.16.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl", hash = "sha256:0a08aed4e0f6c2b6bfc779b714e91dfe8c8491a0ddb4c4370627bb07f9bddcfd", size = 13164612, upload-time = "2025-08-21T22:12:16.366Z" },
+    { url = "https://files.pythonhosted.org/packages/af/0e/8c6704ca7033aa09dc084f285d70481d758972cc341adc3c84d5f82f7b01/temporalio-1.16.0-cp39-abi3-win_amd64.whl", hash = "sha256:7c190362b0d7254f1f93fb71456063e7b299ac85a89f6227758af82c6a5aa65b", size = 13177058, upload-time = "2025-08-21T22:12:44.239Z" },
 ]
 
 [[package]]