fix: harden MCP exit stack cleanup to prevent cross-task cancel-scope errors#2805
Open
GhimBoon wants to merge 5 commits intoChainlit:mainfrom
Open
fix: harden MCP exit stack cleanup to prevent cross-task cancel-scope errors#2805GhimBoon wants to merge 5 commits intoChainlit:mainfrom
GhimBoon wants to merge 5 commits intoChainlit:mainfrom
Conversation
…eaks When an MCP connection fails (network error, auth failure, initialization timeout), the AsyncExitStack is never closed because the except block raises HTTPException before cleanup. The abandoned exit stack is later garbage-collected in a different asyncio task, causing: RuntimeError: Attempted to exit cancel scope in a different task than it was entered in This corrupts anyio's cancel-scope stack and can spin a CPU core at 100%. Changes: - Add safe_mcp_exit_stack_close() helper that suppresses the cross-task cancel scope RuntimeError from anyio during MCP cleanup - connect_mcp: track whether exit_stack was stored via a flag; close it in a finally block when the connection was not successfully stored - connect_mcp: properly delete the old session entry when reconnecting - disconnect_mcp: use safe_mcp_exit_stack_close instead of bare try/except - WebsocketSession.delete: use safe_mcp_exit_stack_close for consistent cleanup - Add tests for safe_mcp_exit_stack_close and cancel-scope error handling Fixes Chainlit#2182
Contributor
There was a problem hiding this comment.
2 issues found across 3 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/chainlit/server.py">
<violation number="1" location="backend/chainlit/server.py:1422">
P2: exit_stack_stored is set to True before on_mcp_connect runs; if the callback raises, the finally block skips cleanup and the partially initialized session remains stored, leaking the exit stack/session.</violation>
</file>
<file name="backend/chainlit/session.py">
<violation number="1" location="backend/chainlit/session.py:34">
P2: BaseExceptionGroup is referenced without a compatibility import, but the project supports Python 3.10 where BaseExceptionGroup is undefined. This will raise NameError during exception matching on 3.10, breaking MCP exit stack cleanup.</violation>
</file>
Since this is your first cubic review, here's how it works:
- cubic automatically reviews your code and comments on bugs and improvements
- Teach cubic by replying to its comments. cubic learns from your replies and gets better over time
- Add one-off context when rerunning by tagging
@cubic-dev-aiwith guidance or docs links (includingllms.txt) - Ask questions if you need clarification on any suggestion
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
Contributor
There was a problem hiding this comment.
1 issue found across 2 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="backend/chainlit/session.py">
<violation number="1" location="backend/chainlit/session.py:47">
P2: BaseExceptionGroup is no longer caught; since it inherits from BaseException (not Exception), BaseExceptionGroup-wrapped cancel-scope errors will bypass `except Exception`, so `_is_cancel_scope_error` never runs and cleanup errors can propagate again.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This change fixes intermittent MCP connection/disconnection failures that surfaced as:
Root cause
MCP AsyncExitStack cleanup was not consistently guaranteed in all failure paths, and some exceptions are wrapped in BaseExceptionGroup, which prevented targeted suppression/handling of the known cancel-scope mismatch case.
What changed
Validation
Impact
Fixes #2182
Summary by cubic
Hardened MCP exit stack cleanup to prevent cross-task cancel-scope errors and resource leaks across connect, disconnect, and reconnect. Fixes intermittent 400s and runtime errors; successful connections are unaffected.
Bug Fixes
Refactors
Written for commit c4bb6e6. Summary will update on new commits.