Skip to content

fix(daemon): handle ACP -32603 errors gracefully in session/set_model#492

Merged
mrcfps merged 1 commit intonexu-io:mainfrom
goemonwanwan:fix/acp-32603-error-recovery
May 5, 2026
Merged

fix(daemon): handle ACP -32603 errors gracefully in session/set_model#492
mrcfps merged 1 commit intonexu-io:mainfrom
goemonwanwan:fix/acp-32603-error-recovery

Conversation

@goemonwanwan
Copy link
Copy Markdown
Contributor

Summary

When a non-default model is selected (e.g. ollama-cloud:kimi-k2.6), Hermes ACP returns {"jsonrpc":"2.0","id":3,"error":{"code":-32603,"message":"Internal error"}} for the session/set_model request. Open Design's acp.ts parser routes all JSON-RPC errors through fail(), which kills the child process. Since the prompt is never sent, the 180-second stage timer fires and the UI shows "ACP response timed out after 180000ms".

Closes #443

Changes

Three patches in apps/daemon/src/acp.ts:

Patch 1 — detectAcpModels parser (~L167)

Suppress -32603 during model detection. These are harmless noise from agent cleanup.

if (raw.error?.code === -32603) return;

Patch 2 — attachAcpSession parser (~L299)

Distinguish between -32603 on the expected response ID vs unexpected IDs:

  • Unexpected ID: suppress as cleanup race-condition noise
  • Expected ID (session/set_model response): fall through to the id-matching logic for recovery

Also adds a finished guard to suppress late-arriving errors after response completion.

Patch 3 — Recovery block (~L351)

When session/set_model fails with -32603, skip model switching and proceed with the default model by calling sendPrompt().

Tested

Verified with Hermes Agent v0.12.0 (ACP JSON-RPC, streamFormat: acp-json-rpc):

[ACP DEBUG] writeRpc: id=1 method=initialize timeoutLabel=initialize
[ACP DEBUG] [ACP RECV RESULT] id=1 keys=agentCapabilities,agentInfo,authMethods,protocolVersion
[ACP DEBUG] writeRpc: id=2 method=session/new timeoutLabel=session/new
[ACP DEBUG] [ACP RECV RESULT] id=2 keys=models,sessionId
[ACP DEBUG] writeRpc: id=3 method=session/set_model timeoutLabel=session/set_model
[ACP DEBUG] [ACP RECV] method=session/update id=null
[ACP DEBUG] [ACP RECV ERROR] id=3 code=-32603 msg=Internal error
[ACP DEBUG] -32603 on expected id=3 — falling through to recovery logic
[ACP DEBUG] session/set_model failed with -32603, falling back to default model and sending prompt
[ACP DEBUG] sendPrompt: id=4 sessionId=... model=ollama-cloud:kimi-k2.6
[ACP DEBUG] writeRpc: id=4 method=session/prompt timeoutLabel=session/prompt
[ACP DEBUG] [ACP RECV] method=session/update id=null (streaming starts)
[ACP DEBUG] [ACP RECV RESULT] id=4 keys=stopReason,usage (completed in ~30s)

Design rationale

Per specs/current/runtime-adapter.md, session/set_model is optional. Agents that don't support it (or reject specific models) should not break the conversation flow. This PR degrades gracefully by falling back to the default model.

@lefarcen
Copy link
Copy Markdown
Contributor

lefarcen commented May 5, 2026

Hi @goemonwanwan! 🎉

Thanks for the contribution — excellent diagnostic work on the -32603 error handling and graceful degradation approach.

I will run a deep review and get back to you within 24h.

Thanks for making open-design better!
— open-design team

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f0849fea89

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment thread apps/daemon/src/acp.ts
Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goemonwanwan! 👋

Excellent diagnostic work on the root cause — the state-machine approach is sound. However, the recovery logic has a critical correctness bug that could duplicate user actions, plus two edge cases that convert real failures into timeouts.

Summary: The PR solves the immediate -32603 crash but the recovery branch is too broad — it triggers for any -32603 after a non-default model was requested, not just session/set_model failures. This can duplicate prompts or loop.

See inline comments for specifics. All fixable with surgical scoping.

Comment thread apps/daemon/src/acp.ts
Comment thread apps/daemon/src/acp.ts
Comment thread apps/daemon/src/acp.ts Outdated
Copy link
Copy Markdown
Contributor

@mrcfps mrcfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the focused ACP recovery patch. I found one state-machine issue that can turn real prompt failures into repeated prompt sends, so I’m requesting a small adjustment before merge.

Generated by Looper 0.5.4 · runner=reviewer · agent=opencode

Comment thread apps/daemon/src/acp.ts
Address review feedback from lefarcen and mrcfps on PR nexu-io#492:

P1: Track setModelRequestId to scope the recovery block to the exact
session/set_model request. This prevents duplicate prompt sends if
session/prompt returns -32603 (which would otherwise match on
expectedId + non-default model conditions).

P1: Add promptRequestId === null guard so the recovery path only
triggers before a prompt has been sent.

P2: In detectAcpModels, only suppress -32603 on unexpected ids.
Expected-id -32603 errors (initialize, session/new) are real probe
failures and should reject immediately rather than causing a silent
15s timeout.

P2: In attachAcpSession, expected-id -32603 errors that don't match
setModelRequestId now call fail() instead of falling through. This
prevents initialize/session/new/session/prompt failures from being
silently swallowed.
@goemonwanwan goemonwanwan force-pushed the fix/acp-32603-error-recovery branch from f0849fe to d9c62d4 Compare May 5, 2026 03:42
Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @goemonwanwan! ✅

Excellent work addressing all the review feedback:

P1 Fixed: The recovery block is now correctly scoped to setModelRequestId and checks promptRequestId === null — no more duplicate-prompt risk.

P2 Fixed: Expected-id -32603 errors for initialize/session/new/session/prompt now fail immediately instead of being swallowed.

P2 Fixed: detectAcpModels now fails expected-id -32603 errors (real probe failures) while suppressing cleanup noise.

The state-machine logic is clean and surgical. This PR is ready from my side — deferring final approval to a maintainer. 🎉

Copy link
Copy Markdown
Contributor

@mrcfps mrcfps left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@goemonwanwan I reviewed the updated ACP recovery diff in apps/daemon/src/acp.ts, including the setModelRequestId scoping, promptRequestId === null guard, and expected-vs-cleanup -32603 handling in both session attachment and model detection. The earlier duplicate-prompt and swallowed-error paths are addressed cleanly, and I didn’t find any remaining actionable issues in the changed ranges. Nice, focused follow-up on a tricky state-machine edge case. 😊

Generated by Looper 0.5.4 · runner=reviewer · agent=opencode

Copy link
Copy Markdown
Contributor

@lefarcen lefarcen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock — all P1/P2 issues confirmed addressed in d9c62d4 (see prior comment). LGTM.

@mrcfps mrcfps merged commit b559857 into nexu-io:main May 5, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug-fix Fixes an existing bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ACP session/set_model returns -32603 Internal error then ACP response timed out after 180000ms

3 participants