Skip to content

Conversation

EiffL
Copy link

@EiffL EiffL commented Aug 28, 2025

Propagates transport disconnects (e.g. server killed mid tool call) as CONNECTION_CLOSED errors instead of hanging indefinitely. Adds regression test.

Motivation and Context

A call_tool over the StreamableHTTP transport would hang forever if the server closed the SSE/HTTP connection mid-response (e.g. process crash/kill). The underlying RemoteProtocolError was logged but in‑flight requests never completed. This change converts that low-level failure into a deterministic JSON-RPC error for every pending request, matching expected resilience semantics.

How Has This Been Tested?

  • Added integrated test test_streamable_http_mid_call_disconnect:
    • Starts real server process.
    • Invokes long-running tool (wait_for_lock_with_notification).
    • Kills server mid-call.
    • Asserts an McpError with code CONNECTION_CLOSED (previously would hang).
  • All existing tests pass locally.

Breaking Changes

None. Behavior only changes for abnormal transport termination; normal flows unaffected. No public API changes.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Implementation details:

In _handle_sse_response (client transport) on exception: send the exception into the session read stream then close the writer to signal termination.

In BaseSession._receive_loop: on receiving an Exception, broadcast a synthesized JSONRPCError (code CONNECTION_CLOSED) to all pending request streams and clear them.

Integrated test placed alongside existing StreamableHTTP tests for consistency. Potential follow-up: introduce optional reconnection/resumption logic for recoverable disconnects.

EiffL and others added 2 commits August 29, 2025 01:23
fix: handle transport exceptions and ensure proper cleanup of in-flig…
@EiffL EiffL requested review from a team and felixweinberger August 28, 2025 23:53
try:
await ctx.read_stream_writer.send(e)
finally:
await ctx.read_stream_writer.aclose()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closing on any type of exception seems to broad - we should be more explicit about the failure cases that definitely mean the session is unusable, there might be recoverable errors.

Comment on lines +339 to +353
# Transport-level exception. Forward it to the incoming
# handler for logging/observation, then fail all
# in-flight requests so callers don't hang forever.
await self._handle_incoming(message)
error = ErrorData(code=CONNECTION_CLOSED, message=str(message))
# Send error to any pending request response streams immediately
for id, stream in list(self._response_streams.items()):
try:
await stream.send(JSONRPCError(jsonrpc="2.0", id=id, error=error))
await stream.aclose()
except Exception:
pass
self._response_streams.clear()
# Break out of the receive loop; connection is no longer usable.
break
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a duplication of all the logic we have at the bottom of _receive_loop (L440-450) (except for the break)

The changes here also don't seem necessary to make the tests you added pass - do we need this?

@felixweinberger felixweinberger added needs more work Not ready to be merged yet, needs additional changes. needs more eyes Needs alignment among maintainers whether this is something we want to add needs maintainer action Potentially serious issue - needs proactive fix and maintainer attention labels Sep 23, 2025
@felixweinberger felixweinberger self-assigned this Sep 30, 2025
@felixweinberger felixweinberger removed the needs maintainer action Potentially serious issue - needs proactive fix and maintainer attention label Sep 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs more eyes Needs alignment among maintainers whether this is something we want to add needs more work Not ready to be merged yet, needs additional changes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants