Skip to content

Conversation

LucaButBoring
Copy link
Contributor

@LucaButBoring LucaButBoring commented Sep 25, 2025

This PR implements the required changes for modelcontextprotocol/modelcontextprotocol#1391, which adds asynchronous tool execution.

This is a large PR, and I expect that if the associated SEP is accepted, we might want to break this down into several smaller PRs for SDK reviewers. I tried to generally have separate commits for each step of the implementation, to try and make this easier to review in its current form.

Motivation and Context

Today, most applications integrate with tools in a straightforward but naive manner, choosing to have agents invoke tools synchronously with the conversation instead of allowing agents to multitask where possible. There are a few reasons why we believe this is the case, including the lack of clarity around tool interfaces (single tool or multiple for job tracking), model failures when manually polling on operations, and not having a way to retrieve results with a well-defined TTL, among other problems (described in more detail in the linked issue). Here, we introduce an alternative API that establishes a clear integration path for async job-style use cases that are typically on the order of minutes to hours.

The ultra high-level overview is as follows:

  • Tools now support synchronous or asynchronous invocation modes
  • A single tool only advertises itself as either sync or async to a given client, controlled by protocol version
  • Sync tools behave just like they always did
  • Async tools are split into start/poll/retrieve stages:
    • Starting a call:
      • tools/call begins an async tool call
      • The result is a CallToolResult containing an operation token, which is used to interact with the async tool call across multiple RPC calls
    • Polling:
      • The operation token is used to call tools/async/status, which returns the current operation status
      • The client should poll this method until the status reaches a terminal value
    • Result retrieval:
      • The operation token is used to call tools/async/result, which has the final tool output

Whether a tool is sync, async, or both (on old/new protocol versions) is defined by tool implementors. This enables remote server operators to control this based on how long each tool is expected to take to execute, rather than potentially serving HTTP requests with widely varying execution times on the same endpoint. This also makes it much more clear to client applications what the "time contract" of a tool is, so that fast tools can still be executed synchronously while allowing long-running tools to be immediately backgrounded.

Usage

Defining an async-compatible tool is just a matter of adjusting the @mcp.tool() decorator to include an invocation_modes parameter, which is a list of "sync" and "async":

@mcp.tool(invocation_modes=["async", "sync"])
async def data_processing_tool(dataset: str, operations: list[str], ctx: Context) -> dict[str, str]:
    await ctx.info(f"Starting data processing pipeline for {dataset}")

    results: dict[str, str] = {}
    total_ops = len(operations)

    for i, operation in enumerate(operations):
        await ctx.debug(f"Executing operation: {operation}")
        await asyncio.sleep(0.5 + (i * 0.2))  # Simulate processing time
        progress = (i + 1) / total_ops  # Report progress
        await ctx.report_progress(progress, 1.0, f"Completed {operation}")
        results[operation] = f"Result of {operation} on {dataset}"  # Store result

    await ctx.info("Data processing pipeline complete!")
    return results

If invocation_modes contains "async", the tool is async-compatible and will only be called in async mode by clients on new versions, while if it contains "sync", the tool is sync-compatible and will be called in sync mode if async mode is not supported (a client will never have the option to choose one or the other itself).

Behind the scenes, the SDK handles branching the behavior to either run synchronously (like today) or asynchronously (immediate return with job tracking) depending on if the client version supports async tools yet or not.

To control how long the results are kept for to retrieve with tools/async/result, we can use the keep_alive parameter:

@mcp.tool(invocation_modes=["async", "sync"], keep_alive=30)  # retain result for 30s following completion

We can also customize the content returned in the immediate CallToolResult with the immediate_result parameter:

async def immediate_feedback(operation: str) -> list[types.ContentBlock]:
    return [types.TextContent(type="text", text=f"Starting {operation}... This may take a moment.")]

@mcp.tool(invocation_modes=["async", "sync"], immediate_result=immediate_feedback)

On the client side, we just add the polling and result retrieval like so:

async def demonstrate_data_processing(session: ClientSession):
    """Demonstrate data processing pipeline."""
    print("\n=== Data Processing Pipeline Demo ===")

    # Just like before
    operations = ["validate", "clean", "transform", "analyze", "export"]
    result = await session.call_tool(
        "data_processing_tool", arguments={"dataset": "customer_data.csv", "operations": operations}
    )

    # We could choose to sent the immediate result content to an agent from here before continuing

    # New parts
    if result.operation:
        token = result.operation.token
        print(f"Data processing started with token: {token}")

        # Poll for completion
        while True:
            status = await session.get_operation_status(token)
            print(f"Status: {status.status}")

            if status.status == "completed":
                final_result = await session.get_operation_result(token)

                # Show structured result if available
                if final_result.result.structuredContent:
                    print("Processing results:")
                    for op, result_text in final_result.result.structuredContent.items():
                        print(f"  {op}: {result_text}")
                break
            elif status.status == "failed":
                print(f"Processing failed: {status.error}")
                break
            elif status.status in ("canceled", "unknown"):
                print(f"Processing ended with status: {status.status}")
                break

            await asyncio.sleep(0.8)

How Has This Been Tested?

Unit tests, integration tests, and new example snippets.

Breaking Changes

Existing users will not need to update their applications to continue using synchronous tool calls. Asynchronous tool calls will require minor code changes that will be documented.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

There were a bunch of decisions in the implementation we may want to discuss further, some of which were due to ambiguity in the proposal (which will be revised again) and some of which were due to working things into the SDK implementation.

  • I added a faux-version called next to deal with the requirement that sessions on the current-latest version always advertise as sync-only. The tests and examples explicitly set the advertised client protocol version to next when calling async-only tools.
  • Reusing the existing tool/call method creates some ambiguities in how outputSchema should be handled, as the immediate tool call result (communicating an accepted state) would no longer have meaningful structuredContent. The output that should be validated is actually the result of GetOperationPayloadResult, so for now I'm skipping validation of the immediate CallToolResult (only in async execution) and only validating GetOperationPayloadResult (sync tool executions are always validated, just like before).
  • keepAlive should have a sentinel value representing "no expiration," and I'm leaning towards None. However, in SDK implementations, that becomes somewhat ambiguous with sync tool calls, which also implicitly have a keepAlive of None already. For now, I default it to 1 hour if not specified/None, but this should probably be changed before this is merged.
  • In sHTTP, the SDK has behavior to send tool-related server messages on the same SSE stream that the server used as a response to the client's CallToolRequest, by attaching a related_request_id to the stream for fast lookups and session resumption. To support sampling and elicitation, we keep a map of operation tokens to their original request IDs to reuse the same event store entry between related calls.
  • The client session needs to cache a mapping of in-flight operation tokens to tool names for validating structuredContent in async tool calls, as it otherwise has no way to look up the cached outputSchema. We could consider including a toolName in GetOperationPayloadResult to avoid the inconvenience, but in this draft I'm using a cache expiry based on keepAlive to avoid holding that mapping forever.

@LucaButBoring
Copy link
Contributor Author

Latest commit implements working stream binding to support elicitation and sampling. There's definitely room for refactoring that one 😔

@felixweinberger felixweinberger added pending publish Draft PRs need to be published for team to review pending SEP approval When a PR is attached as an implementation detail to a SEP, we mark it as such for triage. labels Sep 26, 2025
@LucaButBoring
Copy link
Contributor Author

Added support for configuring the immediate result and made some changes to avoid smuggling parameters through _meta (now we do it through unserialized fields).

@LucaButBoring LucaButBoring marked this pull request as ready for review September 29, 2025 21:25
@LucaButBoring LucaButBoring requested review from a team and ochafik September 29, 2025 21:25
@Kludex Kludex requested a review from Copilot September 30, 2025 08:54
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements asynchronous tool execution for the MCP (Model Context Protocol) Python SDK, enabling long-running operations that execute in the background while clients poll for status and results. The implementation introduces operation tokens for tracking execution state and supports configurable keep-alive durations for result availability.

Key changes include:

  • Added async operation management system with token-based tracking
  • Extended MCP types to support async operation parameters and results
  • Implemented immediate feedback capability for async tools
  • Added client-side polling mechanisms for operation status and results

Reviewed Changes

Copilot reviewed 35 out of 35 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/mcp/types.py Extended protocol types with async operation support and operation tokens
src/mcp/shared/async_operations.py Core async operation management classes for client and server
src/mcp/client/session.py Client-side async operation tracking and polling methods
src/mcp/server/fastmcp/server.py FastMCP integration with async tools and invocation mode filtering
src/mcp/server/fastmcp/tools/base.py Tool base class extensions for async modes and immediate results
src/mcp/server/lowlevel/server.py Low-level server async operation handlers and execution logic
tests/server/fastmcp/test_server.py Comprehensive test coverage for async tool functionality
examples/snippets/servers/async_tool_*.py Example implementations demonstrating async tool patterns
Comments suppressed due to low confidence (2)

src/mcp/server/fastmcp/tools/base.py:162

  • Function _is_async_callable is referenced on line 120 but not defined in this file. Either add the function definition or import it properly.
def _is_async_callable(obj: Any) -> bool:

tests/shared/test_progress_notifications.py:1

  • [nitpick] The operation_token=None parameter addition maintains backward compatibility but consider adding a comment explaining when this would be non-None for clarity in test scenarios.
from typing import Any, cast

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

logger.exception(f"Async execution failed for {tool_name}")
self.async_operations.fail_operation(operation.token, str(e))

asyncio.create_task(execute_async())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stop using asyncio in this code source. Use anyio please.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here, still need to fix AsyncOperationManager. Still trying to figure out how to do that properly with cancellation scopes.

Copy link
Member

@Kludex Kludex Oct 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can help if you need help.

@Kludex
Copy link
Member

Kludex commented Sep 30, 2025

Why would a tool have 2 flavors (async & sync)?

Is there a hint to the client when it should try to retrieve the result, e.g. what retry-after is for HTTP.

@LucaButBoring
Copy link
Contributor Author

Is there a hint to the client when it should try to retrieve the result, e.g. what retry-after is for HTTP.

As in for how often it should poll? There is not, but that's a good callout - I'll amend the proposal to include that.

@LucaButBoring
Copy link
Contributor Author

Why would a tool have 2 flavors (async & sync)?

In the case where a tool is being migrated from sync to async, and may support clients that haven't yet supported the latest (or next, in this case) protocol version, it's useful to support both on the same tool - so clients that don't support the latest version will have a single long request, while clients that do support it will use async tool call semantics.

There will be some use cases where that's desirable, and some where it's not, so it's optional behavior.

@Kludex
Copy link
Member

Kludex commented Oct 2, 2025

If the choice of using async/sync is made by the client, why should we add flavors when defining a tool on the server side?

@Kludex
Copy link
Member

Kludex commented Oct 2, 2025

If the choice of using async/sync is made by the client, why should we add flavors when defining a tool on the server side?

Answering my own question... You need to signal in the tool definition which flavor it supports.

@Kludex
Copy link
Member

Kludex commented Oct 2, 2025

Okay. I don't think we should be adding multiple flavors to a tool, I think it makes things more complicated. Also, the spec anyway supports the invocationMode keyword, and not invocationModes, which I think it makes sense.

I think a better API is to have a new method e.g. async_tool (although "async" is not the best keyword for this in Python). Is it still possible to find a synonym to "async"? "long-run"? 😅


token: str
"""Server-generated token to use for checking status and retrieving results."""
keepAlive: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
keepAlive: int
keep_live: int = Field(alias="keepAlive")


name: str
arguments: dict[str, Any] | None = None
operation_params: AsyncRequestProperties | None = Field(serialization_alias="operation", default=None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the problem with the real name "operation"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The base RequestParams type also has a _operation used for association metadata, which needs to be aliased to operation to not be treated as a private/protected field by pyright wherever it gets used.

error: "_operation" is protected and used outside of the class in which it is declared (reportPrivateUsage)

I didn't want to ignore pyright just because of a naming conflict, but now that I'm looking at this again there are only 3 places where we'd need to do so.

self.async_operations.fail_operation(operation.token, str(e))

async with anyio.create_task_group() as tg:
tg.start_soon(execute_async)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to make this implementation a bit more abstract.

What if I want to execute my tasks in another machine/node/process than the MCP server? The world seems to like a lot durable execution products like Temporal as well.

I think we need something like what this design proposes: https://github.com/pydantic/fasta2a/tree/main?tab=readme-ov-file#design

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made AsyncOperationManager a Server parameter with the intention that consumers would implement something like that for that use case, but that's not enough due to missing a broker to enable that pattern, yeah. Will adjust this accordingly.

@LucaButBoring
Copy link
Contributor Author

I think a better API is to have a new method e.g. async_tool (although "async" is not the best keyword for this in Python). Is it still possible to find a synonym to "async"? "long-run"? 😅

On the SEP, we're going to be moving to referring to these uniformly as long-running operations, so an @mcp.long_running would work.

Do we want to force tool implementors to make a clean break from existing tools, though? Or are you suggesting that in the case where both modes are desired on a tool in an interim state, they would add both @mcp.tool and @mcp.long_running to the same function?

If so, that then requires the tool manager to support adding the same tool more than once, and updating the parameters if long_running is present and a tool is already registered (since the decorators could be applied in either order).

@Kludex
Copy link
Member

Kludex commented Oct 2, 2025

Do we want to force tool implementors to make a clean break from existing tools, though?

As for what I understood, since the invokeMode is a string, you can only have a tool that is either "long-running" or "short-running". If that's the case, then a short running tool is different from a long-running tool. So... I'm not sure if I'm missing something, or the spec reflects something different.

Or are you suggesting that in the case where both modes are desired on a tool in an interim state, they would add both @mcp.tool and @mcp.long_running to the same function?

I wouldn't like to add two decorators. 🤔


If a tool can be short, and long-running, why the invokeMode is not a boolean? supportsAsync?

@LucaButBoring
Copy link
Contributor Author

LucaButBoring commented Oct 2, 2025

As for what I understood, since the invokeMode is a string, you can only have a tool that is either "long-running" or "short-running". If that's the case, then a short running tool is different from a long-running tool. So... I'm not sure if I'm missing something, or the spec reflects something different.

A tool is exactly one or the other from the perspective of a single client. ListTools must show exactly one execution mode. In addition, version negotiation is used to hide long-running tools from clients that don't support them, yet.

However, we can still support hybrid tools for backwards-compatibility purposes if the server operator allows it. Essentially, because the core tool implementation is the same @mcp.tool, we can wrap a tool in the functionality it needs to behave in either way within the server SDK, but only allow a client to use one or the other depending on its version.

There's a few sections of the SEP which discuss this, but pulling one I think is useful:

Old Clients (pre-async support):
// tools/list response (filtered)
{
  tools: [
    // only supports sync execution
    { name: "search_web", description: "Search the web" },
    // only supports sync execution
    { name: "quick_calc", description: "Fast calculation" },
    // supports both sync and async - invocationMode field is hidden from old clients
    { name: "get_weather", description: "Get weather" }
    // async-capable tools hidden from old clients
  ]
}

New Clients (async support):
// tools/list response (complete)
{
  tools: [
    // explicitly only supports sync execution
    { name: "search_web", description: "Search the web", invocationMode: "sync" },
    // implicitly only supports sync execution
    { name: "quick_calc", description: "Fast calculation" },
    // new clients see the async invocation mode and should assume async-only execution
    { name: "get_weather", description: "Get weather", invocationMode: "async" },
    // tool is async-only, and is not shown to old clients
    { name: "deep_analysis", description: "Complex analysis", invocationMode: "async" }
    // All tools visible, async capabilities declared
  ]
}

Note that some tools are present in the "old clients" list despite presenting as LRO-only in the "new clients" list. When an old client invokes a tool represented like this, they won't get the immediate_result response, or even an operation token - the server will simply not return a response until the tool implementation completes.

If a tool can be short, and long-running, why the invokeMode is not a boolean? supportsAsync?

This was the original idea, actually, but for futureproofing it's been changed to an enum.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pending publish Draft PRs need to be published for team to review pending SEP approval When a PR is attached as an implementation detail to a SEP, we mark it as such for triage.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants