-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add request cancellation and cleanup #167
Conversation
0c6ec6d
to
d50255e
Compare
0d7be0a
to
0a93799
Compare
src/mcp/shared/session.py
Outdated
async def _cleanup_loop(self) -> None: | ||
"""Periodically clean up completed and cancelled requests.""" | ||
while True: | ||
with anyio.move_on_after(self._cleanup_interval): | ||
# Clean up completed requests | ||
self._in_flight = { | ||
req_id: responder | ||
for req_id, responder in self._in_flight.items() | ||
if responder.in_flight | ||
} | ||
await anyio.sleep(self._cleanup_interval) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this done on completion or cancellation instead? The idea of using a timer for this seems weird to me, and leaves a window (the length of the interval) of possibly unbounded memory growth.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
src/mcp/shared/session.py
Outdated
# Clean up completed requests | ||
self._in_flight = { | ||
req_id: responder | ||
for req_id, responder in self._in_flight.items() | ||
if responder.in_flight | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OTOH, this seems excessive to do on every new incoming message. Why can't we tie it to completion and cancellation?
This doesn't seem to solve #88 for me. I can reproduce triggering a call tool from claude with a call tool handle that simply waits for 200 s and returns a valid message. Claude receives the response but the server crashes @server.call_tool()
async def handle_call_tool(
name: str, arguments: dict | None,
) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
await asyncio.sleep(200)
response = {"msg": "success"}
return [types.TextContent(type="text", text=f"{response}")] |
This commit adds support for request cancellation and tracking of in-flight requests in the MCP protocol implementation. The key architectural changes are: 1. Request Lifecycle Management: - Added _in_flight dictionary to BaseSession to track active requests - Requests are tracked from receipt until completion/cancellation - Added proper cleanup via on_complete callback 2. Cancellation Support: - Added CancelledNotification handling in _receive_loop - Implemented cancel() method in RequestResponder - Uses anyio.CancelScope for robust cancellation - Sends error response on cancellation 3. Request Context: - Added request_ctx ContextVar for request context - Ensures proper cleanup after request handling - Maintains request state throughout lifecycle 4. Error Handling: - Improved error propagation for cancelled requests - Added proper cleanup of cancelled requests - Maintains consistency of in-flight tracking This change enables clients to cancel long-running requests and servers to properly clean up resources when requests are cancelled. Github-Issue:#88
d50255e
to
827e494
Compare
Hi @dsp-ant, unfortunately, this still fails with the snippet I posted when using Claude. It seems that the important bit of the trace is this: | pydantic_core._pydantic_core.ValidationError: 6 validation errors for ClientNotification
| CancelledNotification.method
| Input should be 'notifications/cancelled' [type=literal_error, input_value='cancelled', input_type=str]
| For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| ProgressNotification.method
| Input should be 'notifications/progress' [type=literal_error, input_value='cancelled', input_type=str]
| For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| ProgressNotification.params.progressToken
| Field required [type=missing, input_value={'requestId': 19, 'reason... -2: Request timed out'}, input_type=dict]
| For further information visit https://errors.pydantic.dev/2.10/v/missing
| ProgressNotification.params.progress
| Field required [type=missing, input_value={'requestId': 19, 'reason... -2: Request timed out'}, input_type=dict]
| For further information visit https://errors.pydantic.dev/2.10/v/missing
| InitializedNotification.method
| Input should be 'notifications/initialized' [type=literal_error, input_value='cancelled', input_type=str]
| For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| RootsListChangedNotification.method
| Input should be 'notifications/roots/list_changed' [type=literal_error, input_value='cancelled', input_type=str]
| For further information visit https://errors.pydantic.dev/2.10/v/literal_error From that is seems that the client (Claude) is sending the cancelled notification wrong (if that makes sense), sending If the problem is indeed triggered from the client side (you'd need to clarify this to me, as I'm very much not an expert on MCP in general or this implementation in particular), then crashing the server does not seem the way to go, meaning the exception should be handled by the SDK. If on the other hand the problem has a server-side origin, we'd need to find were it comes from. I'm happy to test any advancements you have, I'm sorry I cannot be of more help ATM. |
@EduPonz Yes. After fixing actual cancellation and thanks to your perfect repro, I was able to pinpoint the problem. It is caused by validation failing for cancellation messages from Claude Desktop, since Claude Desktop relies on a very old SDK verseion that sends wrong cancellation. This showed that the SDK isnt handling the error gracefuly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broadly looks good. Do we need to update everything to require using RequestResponder
as a context manager? e.g., MCP client code, docs, …?
And should it be an error to use RequestResponder
without entering the context? We could track this with a flag.
# Long enough to ensure timeout | ||
await anyio.sleep(0.2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm quite allergic to actually sleeping in tests, due to the flakiness and slowness. Is there any other way we can exercise these? Would a timeout of 0 work? Or a timeout in the past vs. a timeout in the far future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me fix this. This was Claude written. After a lot of back and forth, the right way is to use events and wait for them to trigger.
Do you know were they keep their code or some place to open a ticket? I was looking for it but did not find much. It's strange that such a popular client is not up to spec... |
a73fc7d
to
08cfbe5
Compare
b32e455
to
733db0c
Compare
Summary
Test plan
Depends on #166