fix(daemon): handle ACP -32603 errors gracefully in session/set_model#492
Conversation
|
Hi @goemonwanwan! 🎉 Thanks for the contribution — excellent diagnostic work on the I will run a deep review and get back to you within 24h. Thanks for making open-design better! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f0849fea89
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
lefarcen
left a comment
There was a problem hiding this comment.
Hi @goemonwanwan! 👋
Excellent diagnostic work on the root cause — the state-machine approach is sound. However, the recovery logic has a critical correctness bug that could duplicate user actions, plus two edge cases that convert real failures into timeouts.
Summary: The PR solves the immediate -32603 crash but the recovery branch is too broad — it triggers for any -32603 after a non-default model was requested, not just session/set_model failures. This can duplicate prompts or loop.
See inline comments for specifics. All fixable with surgical scoping.
mrcfps
left a comment
There was a problem hiding this comment.
Thanks for the focused ACP recovery patch. I found one state-machine issue that can turn real prompt failures into repeated prompt sends, so I’m requesting a small adjustment before merge.
Generated by Looper 0.5.4 · runner=reviewer · agent=opencodeAddress review feedback from lefarcen and mrcfps on PR nexu-io#492: P1: Track setModelRequestId to scope the recovery block to the exact session/set_model request. This prevents duplicate prompt sends if session/prompt returns -32603 (which would otherwise match on expectedId + non-default model conditions). P1: Add promptRequestId === null guard so the recovery path only triggers before a prompt has been sent. P2: In detectAcpModels, only suppress -32603 on unexpected ids. Expected-id -32603 errors (initialize, session/new) are real probe failures and should reject immediately rather than causing a silent 15s timeout. P2: In attachAcpSession, expected-id -32603 errors that don't match setModelRequestId now call fail() instead of falling through. This prevents initialize/session/new/session/prompt failures from being silently swallowed.
f0849fe to
d9c62d4
Compare
lefarcen
left a comment
There was a problem hiding this comment.
Hi @goemonwanwan! ✅
Excellent work addressing all the review feedback:
P1 Fixed: The recovery block is now correctly scoped to setModelRequestId and checks promptRequestId === null — no more duplicate-prompt risk.
P2 Fixed: Expected-id -32603 errors for initialize/session/new/session/prompt now fail immediately instead of being swallowed.
P2 Fixed: detectAcpModels now fails expected-id -32603 errors (real probe failures) while suppressing cleanup noise.
The state-machine logic is clean and surgical. This PR is ready from my side — deferring final approval to a maintainer. 🎉
mrcfps
left a comment
There was a problem hiding this comment.
@goemonwanwan I reviewed the updated ACP recovery diff in apps/daemon/src/acp.ts, including the setModelRequestId scoping, promptRequestId === null guard, and expected-vs-cleanup -32603 handling in both session attachment and model detection. The earlier duplicate-prompt and swallowed-error paths are addressed cleanly, and I didn’t find any remaining actionable issues in the changed ranges. Nice, focused follow-up on a tricky state-machine edge case. 😊
Summary
When a non-default model is selected (e.g.
ollama-cloud:kimi-k2.6), Hermes ACP returns{"jsonrpc":"2.0","id":3,"error":{"code":-32603,"message":"Internal error"}}for thesession/set_modelrequest. Open Design'sacp.tsparser routes all JSON-RPC errors throughfail(), which kills the child process. Since the prompt is never sent, the 180-second stage timer fires and the UI shows "ACP response timed out after 180000ms".Closes #443
Changes
Three patches in
apps/daemon/src/acp.ts:Patch 1 —
detectAcpModelsparser (~L167)Suppress
-32603during model detection. These are harmless noise from agent cleanup.Patch 2 —
attachAcpSessionparser (~L299)Distinguish between
-32603on the expected response ID vs unexpected IDs:session/set_modelresponse): fall through to the id-matching logic for recoveryAlso adds a
finishedguard to suppress late-arriving errors after response completion.Patch 3 — Recovery block (~L351)
When
session/set_modelfails with-32603, skip model switching and proceed with the default model by callingsendPrompt().Tested
Verified with Hermes Agent v0.12.0 (ACP JSON-RPC,
streamFormat: acp-json-rpc):Design rationale
Per
specs/current/runtime-adapter.md,session/set_modelis optional. Agents that don't support it (or reject specific models) should not break the conversation flow. This PR degrades gracefully by falling back to the default model.