Improve Codex multi-agent planning, question flow, and runtime DX by ratley · Pull Request #16 · happycatlabs/orca

ratley · 2026-03-17T05:41:34Z

Summary

update Orca to the newer Codex app-server/client surface and improve Codex binary selection/runtime diagnostics
add end-to-end clarification-question handling: persist pending questions, emit onQuestion, show them in orca status, and resume the same live run from orca answer
improve planning/review/execution prompt guidance for multi-agent-aware Codex runs and tighten the README/docs around the current Codex-only flow

Testing

bun test src
npm run typecheck:tsc
npm run build
live smoke for waiting_for_answer flow

Dependency

depends on Update codex-client for latest app-server features codex-client#1 being merged/published before release tagging

github-actions · 2026-03-17T05:44:18Z

Review findings

BUG: Cancelled run status reverts to `waiting_for_answer`

src/agents/codex/session.ts:825-828

if (currentRun.overallStatus === "cancelled") {
  rejectUserInputRequest(request.requestId, `Run ${interactionContext.runId} was cancelled while waiting for input.`);
  await clearPendingQuestion(request.requestId, "waiting_for_answer");
  return;
}

When a run is cancelled while waiting for user input, the code correctly detects the "cancelled" status and rejects the Codex request — but then calls clearPendingQuestion(request.requestId, "waiting_for_answer"), which writes overallStatus: "waiting_for_answer" back to the store, overwriting the "cancelled" status.

clearPendingQuestion (line 764-772) unconditionally sets overallStatus from its second argument:

await interactionContext.store.updateRun(interactionContext.runId, {
  overallStatus,          // ← "waiting_for_answer" overwrites "cancelled"
  pendingQuestion: undefined,
});

Failure scenario: User runs orca cancel --run <id> while a run is in waiting_for_answer. The store briefly shows "cancelled", but the question-flow handler immediately reverts it to "waiting_for_answer". The run appears stuck waiting for input instead of being cancelled. Subsequent orca status will show the wrong state.

Fix: Either skip clearPendingQuestion entirely when cancelled (just clear pendingQuestion without changing overallStatus), or guard clearPendingQuestion against downgrading a terminal status like "cancelled".

github-actions · 2026-03-20T10:30:50Z

BUG

src/core/question-flow.ts:127 accepts the documented {"answers": ...} form without verifying that every pending question ID is present and valid. orca answer runs that parser before writing answer.txt (src/cli/commands/answer.ts:104), so a typo or partial payload like {"answers":{"backedn":{"answers":["bun"]}}} is reported as success and then forwarded to Codex (src/agents/codex/session.ts:847). That sends an answer set that is missing the real question IDs, which can leave the clarification unresolved even though Orca has already treated the prompt as answered.
src/core/question-flow.ts:96 makes single-question answers that start with { go through the JSON path instead of being treated as plain text. That breaks legitimate non-interactive answers that are JSON/config/code snippets, e.g. orca answer <run-id> '{"useMigration":true}': the parser now either throws a JSON parse error or complains that the pending question ID is missing. Single-question answers used to accept arbitrary text, so this is a real regression for current CLI workflows.

github-actions · 2026-03-20T10:40:02Z

BUG

src/agents/codex/session.ts:760, src/agents/codex/session.ts:850, src/core/planner.ts:290 — planning/review turns now share the same interactive question handler, but clearPendingQuestion() can only restore overallStatus to "running". If Codex asks a clarification question during the planning gate or task-graph review, answering it makes orca status report a live run as Mode: plan / Overall Status: running until planning finishes. That is a real run-state regression against the documented lifecycle, and any polling/automation that waits for planning to finish will see execution as started before a task graph exists.
src/agents/codex/codex-path.ts:146, src/agents/codex/codex-path.ts:170 — the new binary chooser does not actually search all installed codex CLIs. It returns the first codex found on PATH, then compares only that path plus three hard-coded locations. A newer binary later on PATH (for example ~/.local/bin, pnpm, or another managed install) is never considered, so Orca can still launch the older incompatible CLI even though this PR claims it will auto-pick the newest available binary.

RISK

src/core/question-flow.ts:14, src/cli/commands/answer.ts:59, src/cli/commands/answer.ts:110 — isSecret is preserved from Codex prompts but never honored. When a current requestUserInput question is marked secret, Orca still collects it with plain input() and writes the response verbatim to <run>/answer.txt. That is a reproducible plaintext secret leak for API keys/passwords on the new question-flow path.

github-actions · 2026-03-20T11:01:48Z

BUG

src/core/question-flow.ts:45, src/core/question-flow.ts:76, src/cli/commands/status.ts:68: orca status now shows pending questions, but it never prints each question's id. That breaks the README-documented multi-question answer flow (orca answer <run-id> '{"answers":{...}}'), because users cannot discover which keys to send in the JSON payload from the CLI. Any run that asks 2+ questions will be effectively unanswerable unless the user has an onQuestion hook or reads status.json by hand.

RISK

src/agents/codex/session.ts:501, docs/codex-app-server.md:953: loadCodexListedSkills() now hard-requires skill.path from skills/list and silently drops entries without it. The new bundled app-server docs for skills/list only show name, description, enabled, interface, and dependencies, so if codex-client 0.1.5 follows that surface, Orca will stop loading every app-server-discovered skill, including codex.perCwdExtraUserRoots skills.

github-actions · 2026-03-20T11:12:34Z

BUG

src/agents/codex/session.ts:1136-1153: executeTask now treats any turn.status === "completed" as task success and returns { outcome: "done" } without parsing the assistant's required terminal JSON marker. A Codex turn can complete successfully at the protocol level while still reporting {"outcome":"failed","error":"..."} in its message body, so task failures will be misclassified as successes, retries will be skipped, and Orca can finalize a run as completed with unfinished work.

RISK

src/agents/codex/session.ts:992-1047 and 1058-1059: the new clarification waiter never stops when app-server clears a pending tool/requestUserInput via serverRequest:resolved. The bundled protocol doc explicitly says this notification is emitted when a request is cleared by turn completion or interruption before the user answers (docs/codex-app-server.md:888-890), but Orca only clears persisted state; the polling loop keeps waiting for answer.txt/IPC input forever because it only exits on cancellation or a parsed answer. In that case the live orca run can hang even though Codex has already moved on or finished the turn.

github-actions · 2026-03-20T11:25:21Z

BUG

src/agents/codex/session.ts:479 and src/agents/codex/session.ts:395: codex.perCwdExtraUserRoots no longer delivers usable skill instructions to Codex. The app-server docs show skills/list can return metadata-only entries without a path (docs/codex-app-server.md:940), and this code falls back to inlining only description/interface/dependencies text when that happens. Because turns now send only a plain text blob instead of skill input items, shared skills discovered only through skills/list never get their real SKILL.md body loaded. Users following the new README.md:208 guidance for app-server-visible extra skill roots will get a name/description stub instead of the actual workflow.

BUG

src/core/codex-config.ts:71 and src/agents/codex/session.ts:867: an explicit project opt-out cannot disable multi-agent-aware prompts once ~/.codex/config.toml has [features].multi_agent = true. isCodexMultiAgentActive() only checks for config.codex.multiAgent === true; any explicit false still falls through to the global file and returns true. That contradicts the documented contract in README.md:191, which says the global flag is only used when codex.multiAgent is not set in Orca config, and it means users cannot force single-agent planning/review/execution prompts per project after enabling multi-agent globally once.

RISK

src/agents/codex/session.ts:745 and src/agents/codex/session.ts:964: the new secret-answer flow persists the IPC socket path and bearer token into run state so orca answer can read them back. In any project-local or otherwise shared runsDir setup, any local process that can read that run directory can reuse the token and inject an arbitrary answer into the live run, so the new “secret” path only avoids writing the answer to answer.txt; it does not actually protect the prompt from other local readers.

github-actions · 2026-03-20T20:09:00Z

BUG

src/agents/codex/session.ts:400 and src/agents/codex/session.ts:599 remove skill input items entirely and fall back to metadata-only text when skills/list does not return a readable path. The new app-server reference explicitly documents that the skill item is what injects full skill instructions (docs/codex-app-server.md:909), and its skills/list example omits path entirely (docs/codex-app-server.md:953). That means any app-server-visible skill that Orca cannot read directly from disk now degrades to a short description/interface blob instead of the actual SKILL.md, so those skills silently stop influencing planning/execution for real users.

BUG

src/agents/codex/session.ts:1081 and src/agents/codex/session.ts:1145 can resurrect a cancelled run. When a run is cancelled while waiting for tool/requestUserInput, the loop rejects the pending request, but the unconditional serverRequest:resolved handler always calls clearPendingQuestion(..., resumedOverallStatus), which rewrites overallStatus back to planning/running. The app-server docs say serverRequest/resolved is emitted when a user-input request is answered or cleared (docs/codex-app-server.md:888), so cancelling a clarification prompt can leave status.json showing an active run again instead of staying cancelled.

github-actions · 2026-03-21T01:12:52Z

BUG

src/agents/codex/session.ts:152, src/agents/codex/session.ts:574: the new fallback success check now requires a completed fileChange item for any task that looks like it edits files. That is stricter than the execution contract in the same prompt, which explicitly allows arbitrary shell-based edits. A normal successful turn that edits files via commandExecution (cat > file, sed -i, scripts, etc.) and even passes validation will still be converted to outcome: "failed" if Codex forgets the trailing JSON marker, because no fileChange item was recorded. This turns a recoverable response-format miss into a hard task failure on real file-edit runs.
src/core/task-runner.ts:219, src/cli/commands/run.ts:524, src/cli/commands/run.ts:613: runs with review.execution.enabled = false now fire completion side effects twice. run.ts passes deferCompletion: reviewConfig.enabled, so with review disabled the task runner already marks the run completed, emits onComplete, and writes the session summary. After that, runCommandHandler still unconditionally emits onComplete again and rewrites the summary for every non-failed/non-cancelled run. Any onComplete hook that posts notifications, opens PRs, or triggers CI will now execute twice in that configuration.

github-actions · 2026-03-21T01:52:03Z

BUG

src/agents/codex/session.ts:1515 and src/agents/codex/session.ts:1529 start a brand-new Codex thread before the clarification pass and again before the real task execution. That breaks the persistent-per-run session contract documented in README.md and in src/agents/codex/session.ts:1140, because dependent tasks no longer inherit prior turn context. A concrete failure case is a multi-step run where task 1 makes a design choice or gathers nuanced clarification and task 2 says to continue with that choice; task 2 now runs in a fresh thread and only gets whatever was squeezed into clarificationContext, so real run decisions can be lost.

RISK

src/agents/codex/session.ts:1184 forces every non-execution turn into collaborationMode: { mode: "plan" } whenever Orca is running interactively, and src/cli/commands/run.ts:541 sends the post-execution reviewer through that path via runPrompt(prompt, "review"). The review prompt explicitly asks Codex to inspect changes and apply fixes, but this code now routes it through the plan preset instead of a normal/reviewer turn. In the review.execution.onFindings: "auto_fix" flow, that can leave the reviewer unable to perform the fixes it is being asked to make, so the loop stalls on findings that Orca previously auto-corrected.

Change summary: this PR updates Orca to the newer Codex surface, adds persisted clarification-question handling with orca answer/orca status, and reworks runtime diagnostics plus multi-agent/review prompt guidance.

github-actions · 2026-03-21T02:31:41Z

RISK

src/agents/codex/session.ts:1190, src/agents/codex/session.ts:1218, src/core/planner.ts:290, src/cli/commands/run.ts:486: normal orca plan/orca run execution now always routes through the new collaboration-mode/question-flow surface whenever a live run context exists, and there is no compatibility fallback if the selected Codex binary does not support it. A user pinned to an older Codex via ORCA_CODEX_PATH (or only having an older app-server build installed) will now fail during planning/consultation or the pre-execution clarification pass before any task runs, instead of degrading to the previous non-question-flow behavior. This is a real regression for existing Orca setups that intentionally override the binary path.

github-actions · 2026-03-21T10:52:41Z

BUG

src/core/planner.ts:299 and src/agents/codex/session.ts:1668: planning-time clarification answers are not preserved across the planning pipeline. runPlanner() now enables request_user_input during decidePlanningNeed(), but the adapter wrappers are explicitly stateless and create a fresh Codex thread for each call. If the gate asks a blocking question and returns needsPlan=true, the later planSpec() call starts from the original spec in a new thread, so the user's answer is no longer in context. The concrete failure mode is an ambiguous spec that asks for a choice during the planning gate (for example runtime/framework), then either re-asks that same question or generates a task graph that ignores the answer because the actual planner never sees it.

ratley added 3 commits March 12, 2026 01:09

Improve Codex multi-agent planning and runtime DX

7ecded4

Implement Orca clarification question flow

2ab121d

Tighten README for question flow and Codex docs

97fc21e

ratley mentioned this pull request Mar 17, 2026

Sync Orca docs for question hooks and answer flow ratley/orca-web#6

Open

ratley added 7 commits March 16, 2026 23:00

Prepare Orca 0.2.25 for codex-client 0.1.5

faadcca

Stop tracking local session logs and TODO

124de52

Preserve cancelled runs during question flow

e74d658

Replace Claude review with Codex workflow

b48c5bc

Stabilize Codex review workflow

c1c493e

Align Codex review workflow with working runner setup

d1c252f

Fix Codex review prompt generation

f872d41

Harden question answer parsing

5c7ff0c

Fix review feedback on question flow and Codex path resolution

83c1256

Fix status question ids and skills list metadata fallback

ef20a21

Fix Codex task outcome parsing and resolved input waits

d6b5a09

Fix Codex skill and secret question regressions

48e7fdc

Harden Orca e2e question flow and smoke coverage

e864bec

Fix Codex fallback completion and duplicate onComplete

7e46623

Restore Codex thread persistence and review turn routing

c7b8733

Fallback when Codex collaboration mode is unsupported

84ff460

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Codex multi-agent planning, question flow, and runtime DX#16

Improve Codex multi-agent planning, question flow, and runtime DX#16
ratley wants to merge 19 commits intomasterfrom
codex/release-orca-0.2.25

ratley commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 17, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ratley commented Mar 17, 2026

Summary

Testing

Dependency

Uh oh!

github-actions bot commented Mar 17, 2026

Review findings

BUG: Cancelled run status reverts to waiting_for_answer

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 20, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

BUG: Cancelled run status reverts to `waiting_for_answer`