Skip to content

fix: route /codex:rescue through the Agent tool to stop Skill recursion (#234)#235

Merged
dkundel-openai merged 2 commits intoopenai:mainfrom
pengyou200902:fix/rescue-command-agent-routing
Apr 18, 2026
Merged

fix: route /codex:rescue through the Agent tool to stop Skill recursion (#234)#235
dkundel-openai merged 2 commits intoopenai:mainfrom
pengyou200902:fix/rescue-command-agent-routing

Conversation

@pengyou200902
Copy link
Copy Markdown
Contributor

@pengyou200902 pengyou200902 commented Apr 16, 2026

Fixes #234.

What was broken

/codex:rescue was marked context: fork, so the command body ran inside a forked general-purpose subagent. The body said:

Route this request to the codex:codex-rescue subagent.

When the main Claude Code agent called Skill(codex:rescue) programmatically (as opposed to the user typing the slash command), the forked runner read that ambiguous prose and guessed the transport. It tried Skill(codex:codex-rescue) — unknown skill — then fell back to Skill(codex:rescue), which re-entered this same command and recursed until the user cancelled. No Codex job was ever created.

The user-facing symptom, quoted from the issue:

UI appears frozen; no Codex job is created… After ~3+ minutes, user must cancel manually.

Why the fix is two coordinated changes and not one

The issue's "Suggested fix" section lists two options:

  1. Name the transport explicitly (use the Agent tool).
  2. Restrict allowed-tools to exclude Skill.

In practice, neither alone is sufficient. Forked general-purpose subagents do not expose the Agent tool, so instructing the fork to "use Agent" just trades the hang for a clean failure — the main agent then retries on its own. The minimum fix is therefore two coordinated changes:

  • Drop context: fork. The command body now runs inline in the calling agent's context, where the Agent tool is in scope.
  • Name the transport explicitly. Invoke codex:codex-rescue via Agent(subagent_type: "codex:codex-rescue"), add Agent to allowed-tools, and call out that Skill(codex:codex-rescue) / Skill(codex:rescue) are not valid paths.

Everything else in rescue.md — resume-candidate check, --background / --wait / --resume / --fresh handling, operating rules, model/effort pass-through — is unchanged. agents/codex-rescue.md and skills/codex-cli-runtime/SKILL.md are not touched.

Diff

plugins/codex/commands/rescue.md: 4 lines changed (one frontmatter swap, two body sentences rewritten).

-context: fork
-allowed-tools: Bash(node:*), AskUserQuestion
+allowed-tools: Bash(node:*), AskUserQuestion, Agent

-Route this request to the `codex:codex-rescue` subagent.
+Invoke the `codex:codex-rescue` subagent via the `Agent` tool
+(`subagent_type: "codex:codex-rescue"`), forwarding the raw user
+request as the prompt.
+`codex:codex-rescue` is a subagent, not a skill — do not call
+`Skill(codex:codex-rescue)` … or `Skill(codex:rescue)` (that
+re-enters this command and hangs the session). The command runs
+inline so the `Agent` tool stays in scope; forked general-purpose
+subagents do not expose it.

tests/commands.test.mjs: pin the new allowed-tools, pin the explicit subagent_type: "codex:codex-rescue" prose, pin the Skill(codex:codex-rescue) ban, and assert context: fork is absent. Total +9 / −1.

Not in scope

During iteration on this fix I observed adjacent duplicate-task-call patterns inside codex:codex-rescue (model-name drift retried on Codex 4xx, long prompts failing on zsh shell-quoting, main-agent retry on upstream failure). Those are separate bugs and deserve separate issues and separate PRs where they can be discussed on their own merits. I deliberately kept this PR scoped to #234's routing hang to keep the diff small, reviewable, and directly tied to the filed issue.

Test plan

  • node --test tests/commands.test.mjs -t "rescue command absorbs continue semantics" passes.
  • Negative: with the pre-fix rescue.md restored, the new assertions fail on allowed-toolsAgent missing, then would also fail on the subagent_type and no-context: fork assertions.
  • node --test tests/commands.test.mjs as a whole: 7/8 — the one failure is a pre-existing result "$ARGUMENTS" quoting assertion unrelated to rescue (introduced in fix: quote $ARGUMENTS in cancel, result, and status commands #168, also fails on pristine upstream/main).
  • Live end-to-end requires a fresh Claude Code session with this branch's plugin installed; not runnable from the session that prepared this PR.

Commits

One commit: cd4cb60 fix: route /codex:rescue through the Agent tool to stop Skill recursion (#234).

@pengyou200902 pengyou200902 requested a review from a team April 16, 2026 23:03
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: eb24951c89

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/codex/agents/codex-rescue.md Outdated
pengyou200902 added a commit to pengyou200902/codex-plugin-cc that referenced this pull request Apr 18, 2026
Address openai#235 review comment (Codex bot, P2): the new character-for-character
model rule in `agents/codex-rescue.md` conflicts with the `spark` alias line
immediately above it. For a user request like "use spark", line 31 says to
emit `--model gpt-5.3-codex-spark`, while line 33 says never to emit a model
name the user did not literally type. That is nondeterministic and undermines
the no-duplicate-call goal.

Rephrase the literal-copy rule to explicitly exempt the documented `spark`
alias, and mirror the fix in `skills/codex-cli-runtime/SKILL.md`. Pin the
exemption with a regression assertion so a future edit can't silently
reintroduce the contradiction.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c582eda025

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread plugins/codex/agents/codex-rescue.md Outdated
…on (openai#234)

`/codex:rescue` previously combined two things that together caused a hang:

- `context: fork` in the frontmatter, which spawns a `general-purpose`
  subagent for the command body.
- Body prose "Route this request to the `codex:codex-rescue` subagent."
  without naming the transport.

When the main agent called `Skill(codex:rescue)` programmatically, the
fork resolved the ambiguous prose by trying `Skill(codex:codex-rescue)`
(unknown skill) and then falling back to `Skill(codex:rescue)`, which
re-entered this command and hung the session until the user cancelled.
No Codex job was ever created.

Naming the transport as `Agent(codex:codex-rescue)` alone is not enough:
forked general-purpose subagents do not expose the `Agent` tool, so the
forked runner cannot reach the subagent that way either. The minimal fix
is therefore two coordinated changes:

- Drop `context: fork` so the command body runs inline in the calling
  agent's context, where `Agent` is in scope.
- Say explicitly "use the `Agent` tool with `subagent_type:
  "codex:codex-rescue"`", and call out that `Skill(codex:codex-rescue)`
  and `Skill(codex:rescue)` are not valid routing paths. Add `Agent`
  to `allowed-tools` so the call does not prompt for permission.

Everything else in rescue.md (resume-candidate check, flag handling,
background/foreground semantics, operating rules) is unchanged. The
`codex:codex-rescue` subagent itself is unchanged.

Tests pin the new allow-list, the explicit `subagent_type`, the ban on
`Skill(codex:codex-rescue)`, and the absence of `context: fork`. The
existing "run the `codex:codex-rescue` subagent in the background"
assertion continues to hold since that sentence still reads correctly
with the Agent-tool transport.

Fixes openai#234
@pengyou200902 pengyou200902 force-pushed the fix/rescue-command-agent-routing branch from c582eda to cd4cb60 Compare April 18, 2026 07:39
@pengyou200902
Copy link
Copy Markdown
Contributor Author

A note on the context: fork removal in case reviewers wonder whether that layer is load-bearing:

Original topology (with context: fork): /codex:rescue spawned a general-purpose fork; that fork was supposed to dispatch to codex:codex-rescue; Codex ran inside the codex:codex-rescue subagent. Two layers of isolation.

The bug: forked general-purpose subagents do not expose the Agent tool, so from inside the fork there is no way to reach the codex:codex-rescue subagent — which is the single place this plugin defines the task forwarder. The fork's only option is to guess a Skill path, which is exactly what drove the recursion in #234.

After the fix: the command body runs inline in the caller, which does have Agent, so it calls Agent(subagent_type: "codex:codex-rescue") directly. The codex:codex-rescue subagent is still its own Agent-tool subagent with its own sandbox, its own tool set, and its own tool_result boundary — Codex execution isolation is preserved, just one layer down instead of two.

Consistency check: every other command in plugins/codex/commands/ already runs inline — review, adversarial-review, setup, status, result, cancel. rescue.md was the only one with context: fork, and review / adversarial-review do comparable or more orchestration. So the fork was an outlier, not a pattern.

Trade-off acknowledged: the calling agent now sees the rescue.md body and the task-resume-candidate / AskUserQuestion / Agent tool transcripts in its own context during one rescue call, rather than behind a fork. That's a small amount of additional context usage per invocation, matching every other command in this plugin.

Alternative considered: keeping the fork and having it call Bash(node codex-companion.mjs task …) directly would sidestep the missing-Agent issue without removing context: fork, but it duplicates the forwarder logic that lives in agents/codex-rescue.md, creates two sources of truth, and is a structurally larger diff. Happy to switch to that shape if the maintainers would rather keep the fork — just let me know.

@dkundel-openai dkundel-openai merged commit bb38412 into openai:main Apr 18, 2026
1 check passed
@pengyou200902 pengyou200902 deleted the fix/rescue-command-agent-routing branch April 18, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Skill(codex:rescue) infinite-recurses when invoked programmatically from main agent

2 participants