-
Notifications
You must be signed in to change notification settings - Fork 109
Add retry logic for ConversationInfo validation race condition #1559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add retry logic for ConversationInfo validation race condition #1559
Conversation
Coverage Report •
|
||||||||||||||||||||
e9e5ffc to
6db70d0
Compare
Add retry with exponential backoff (0.1s, 0.2s, 0.4s, 0.8s) in _compose_conversation_info() when ConversationInfo validation fails due to transient race conditions where model_dump() returns incomplete data during concurrent state mutations. This is a workaround for issue #1557. A TODO comment with a link to the issue has been added for future reference. Fixes #1557 Co-authored-by: openhands <[email protected]>
6db70d0 to
cc5ffd3
Compare
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
2 similar comments
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
|
[Automatic Post]: It has been a while since there was any activity on this PR. @simonrosenberg, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up. |
Summary
This PR adds a simple workaround for the transient ConversationInfo validation error described in #1557.
Problem
During SWT-bench eval,
GET /api/conversations/{id}returned HTTP 500 due to apydantic_core.ValidationErrorwhen constructingConversationInfo. The error indicated that required fieldsid,agent, andworkspacewere missing from the input dict.This is a race condition where
ConversationState.model_dump()can return incomplete data during concurrent state mutations.Solution
Add a
_get_state_dump_with_retry()function that:state.model_dump()and checks if all required fields (id,agent,workspace) are presentRuntimeErrorwith a helpful error message linking to the issueThis is a workaround - the proper fix would be to implement locking in
ConversationState.model_dump()as proposed in PR #1558. A TODO comment with a link to the issue has been added for future reference.Testing
_get_state_dump_with_retry()function:test_returns_immediately_when_all_fields_present- verifies no retry when fields are presenttest_retries_when_fields_missing_then_succeeds- verifies retry after 0.1stest_retries_twice_then_succeeds- verifies retry after 0.1s and 0.3stest_raises_error_after_all_retries_exhausted- verifies error after all retriesAll existing tests continue to pass.
Fixes #1557
@simonrosenberg can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:47d135d-pythonRun
All tags pushed for this build
About Multi-Architecture Support
47d135d-python) is a multi-arch manifest supporting both amd64 and arm6447d135d-python-amd64) are also available if needed