-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Handle Abrupt Server Disconnect in Middle of Tool Call #1321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
fix: handle transport exceptions and ensure proper cleanup of in-flig…
try: | ||
await ctx.read_stream_writer.send(e) | ||
finally: | ||
await ctx.read_stream_writer.aclose() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Closing on any type of exception seems to broad - we should be more explicit about the failure cases that definitely mean the session is unusable, there might be recoverable errors.
# Transport-level exception. Forward it to the incoming | ||
# handler for logging/observation, then fail all | ||
# in-flight requests so callers don't hang forever. | ||
await self._handle_incoming(message) | ||
error = ErrorData(code=CONNECTION_CLOSED, message=str(message)) | ||
# Send error to any pending request response streams immediately | ||
for id, stream in list(self._response_streams.items()): | ||
try: | ||
await stream.send(JSONRPCError(jsonrpc="2.0", id=id, error=error)) | ||
await stream.aclose() | ||
except Exception: | ||
pass | ||
self._response_streams.clear() | ||
# Break out of the receive loop; connection is no longer usable. | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a duplication of all the logic we have at the bottom of _receive_loop
(L440-450) (except for the break
)
The changes here also don't seem necessary to make the tests you added pass - do we need this?
Propagates transport disconnects (e.g. server killed mid tool call) as CONNECTION_CLOSED errors instead of hanging indefinitely. Adds regression test.
Motivation and Context
A call_tool over the StreamableHTTP transport would hang forever if the server closed the SSE/HTTP connection mid-response (e.g. process crash/kill). The underlying RemoteProtocolError was logged but in‑flight requests never completed. This change converts that low-level failure into a deterministic JSON-RPC error for every pending request, matching expected resilience semantics.
How Has This Been Tested?
Breaking Changes
None. Behavior only changes for abnormal transport termination; normal flows unaffected. No public API changes.
Types of changes
Checklist
Additional context
Implementation details:
In _handle_sse_response (client transport) on exception: send the exception into the session read stream then close the writer to signal termination.
In BaseSession._receive_loop: on receiving an Exception, broadcast a synthesized JSONRPCError (code CONNECTION_CLOSED) to all pending request streams and clear them.
Integrated test placed alongside existing StreamableHTTP tests for consistency. Potential follow-up: introduce optional reconnection/resumption logic for recoverable disconnects.