Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add request cancellation and cleanup #167

Merged
merged 4 commits into from
Feb 4, 2025
Merged

feat: add request cancellation and cleanup #167

merged 4 commits into from
Feb 4, 2025

Conversation

dsp-ant
Copy link
Member

@dsp-ant dsp-ant commented Jan 23, 2025

Summary

  • Add support for request cancellation via notifications
  • Implement periodic cleanup of in-flight requests
  • Track request state throughout lifecycle
  • Fixes Random error thrown on response #88

Test plan

  • Test request cancellation
  • Verify cleanup of completed requests
  • Test concurrent request handling

Depends on #166

@dsp-ant dsp-ant linked an issue Jan 23, 2025 that may be closed by this pull request
Base automatically changed from davidsp/refactor to main January 24, 2025 09:54
Comment on lines 228 to 238
async def _cleanup_loop(self) -> None:
"""Periodically clean up completed and cancelled requests."""
while True:
with anyio.move_on_after(self._cleanup_interval):
# Clean up completed requests
self._in_flight = {
req_id: responder
for req_id, responder in self._in_flight.items()
if responder.in_flight
}
await anyio.sleep(self._cleanup_interval)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why isn't this done on completion or cancellation instead? The idea of using a timer for this seems weird to me, and leaves a window (the length of the interval) of possibly unbounded memory growth.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 249 to 254
# Clean up completed requests
self._in_flight = {
req_id: responder
for req_id, responder in self._in_flight.items()
if responder.in_flight
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OTOH, this seems excessive to do on every new incoming message. Why can't we tie it to completion and cancellation?

@EduPonz
Copy link

EduPonz commented Jan 29, 2025

This doesn't seem to solve #88 for me. I can reproduce triggering a call tool from claude with a call tool handle that simply waits for 200 s and returns a valid message. Claude receives the response but the server crashes

@server.call_tool()
async def handle_call_tool(
    name: str, arguments: dict | None,
) -> list[types.TextContent | types.ImageContent | types.EmbeddedResource]:
    await asyncio.sleep(200)
    response = {"msg": "success"}
    return [types.TextContent(type="text", text=f"{response}")]

This commit adds support for request cancellation and tracking of
in-flight requests in the MCP protocol implementation. The key
architectural changes are:

1. Request Lifecycle Management:
   - Added _in_flight dictionary to BaseSession to track active requests
   - Requests are tracked from receipt until completion/cancellation
   - Added proper cleanup via on_complete callback

2. Cancellation Support:
   - Added CancelledNotification handling in _receive_loop
   - Implemented cancel() method in RequestResponder
   - Uses anyio.CancelScope for robust cancellation
   - Sends error response on cancellation

3. Request Context:
   - Added request_ctx ContextVar for request context
   - Ensures proper cleanup after request handling
   - Maintains request state throughout lifecycle

4. Error Handling:
   - Improved error propagation for cancelled requests
   - Added proper cleanup of cancelled requests
   - Maintains consistency of in-flight tracking

This change enables clients to cancel long-running requests and
servers to properly clean up resources when requests are cancelled.

Github-Issue:#88
@EduPonz
Copy link

EduPonz commented Feb 4, 2025

Hi @dsp-ant, unfortunately, this still fails with the snippet I posted when using Claude. It seems that the important bit of the trace is this:

| pydantic_core._pydantic_core.ValidationError: 6 validation errors for ClientNotification
| CancelledNotification.method
|   Input should be 'notifications/cancelled' [type=literal_error, input_value='cancelled', input_type=str]
|     For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| ProgressNotification.method
|   Input should be 'notifications/progress' [type=literal_error, input_value='cancelled', input_type=str]
|     For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| ProgressNotification.params.progressToken
|   Field required [type=missing, input_value={'requestId': 19, 'reason... -2: Request timed out'}, input_type=dict]
|     For further information visit https://errors.pydantic.dev/2.10/v/missing
| ProgressNotification.params.progress
|   Field required [type=missing, input_value={'requestId': 19, 'reason... -2: Request timed out'}, input_type=dict]
|     For further information visit https://errors.pydantic.dev/2.10/v/missing
| InitializedNotification.method
|   Input should be 'notifications/initialized' [type=literal_error, input_value='cancelled', input_type=str]
|     For further information visit https://errors.pydantic.dev/2.10/v/literal_error
| RootsListChangedNotification.method
|   Input should be 'notifications/roots/list_changed' [type=literal_error, input_value='cancelled', input_type=str]
|     For further information visit https://errors.pydantic.dev/2.10/v/literal_error

From that is seems that the client (Claude) is sending the cancelled notification wrong (if that makes sense), sending cancelled instead of notifications/cancelled.

If the problem is indeed triggered from the client side (you'd need to clarify this to me, as I'm very much not an expert on MCP in general or this implementation in particular), then crashing the server does not seem the way to go, meaning the exception should be handled by the SDK. If on the other hand the problem has a server-side origin, we'd need to find were it comes from. I'm happy to test any advancements you have, I'm sorry I cannot be of more help ATM.

@EduPonz
Copy link

EduPonz commented Feb 4, 2025

It seems that a73fc7d does the trick at least for me, thanks @dsp-ant!!

@dsp-ant
Copy link
Member Author

dsp-ant commented Feb 4, 2025

@EduPonz Yes. After fixing actual cancellation and thanks to your perfect repro, I was able to pinpoint the problem. It is caused by validation failing for cancellation messages from Claude Desktop, since Claude Desktop relies on a very old SDK verseion that sends wrong cancellation. This showed that the SDK isnt handling the error gracefuly.

jspahrsummers
jspahrsummers previously approved these changes Feb 4, 2025
Copy link
Member

@jspahrsummers jspahrsummers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Broadly looks good. Do we need to update everything to require using RequestResponder as a context manager? e.g., MCP client code, docs, …?

And should it be an error to use RequestResponder without entering the context? We could track this with a flag.

Comment on lines +43 to +47
# Long enough to ensure timeout
await anyio.sleep(0.2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm quite allergic to actually sleeping in tests, due to the flakiness and slowness. Is there any other way we can exercise these? Would a timeout of 0 work? Or a timeout in the past vs. a timeout in the far future?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me fix this. This was Claude written. After a lot of back and forth, the right way is to use events and wait for them to trigger.

@EduPonz
Copy link

EduPonz commented Feb 4, 2025

@EduPonz Yes. After fixing actual cancellation and thanks to your perfect repro, I was able to pinpoint the problem. It is caused by validation failing for cancellation messages from Claude Desktop, since Claude Desktop relies on a very old SDK verseion that sends wrong cancellation. This showed that the SDK isnt handling the error gracefuly.

Do you know were they keep their code or some place to open a ticket? I was looking for it but did not find much. It's strange that such a popular client is not up to spec...

@dsp-ant dsp-ant merged commit c58adfe into main Feb 4, 2025
5 checks passed
@dsp-ant dsp-ant deleted the davidsp/88-v2 branch February 4, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random error thrown on response
3 participants