-
Notifications
You must be signed in to change notification settings - Fork 105
Fix conversation restore failing due to secret serialization/deserialization mismatch #1672
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…ization mismatch (ALL-4846)
## Problem
V1 conversations fail to restore from persistence with a Pydantic validation
error when the conversation state contains secrets in the secret_registry.
The error occurs because secrets are serialized as redacted values ('**********')
but the StaticSecret model requires a non-null 'value' field.
## Root Cause
1. Secrets are serialized without cipher or expose_secrets context, resulting
in '**********' (redacted) values
2. During deserialization, validate_secret() converts '**********' to None
3. StaticSecret.value was a required field (not Optional), causing validation
to fail when None was returned
Similar issue existed in LookupSecret where assertions would fail for redacted
secret headers.
## Fix
1. Made StaticSecret.value optional with None default:
`value: SecretStr | None = None`
2. Updated StaticSecret.get_value() to handle None values
3. Updated LookupSecret header validators to skip redacted headers instead
of asserting (which would crash)
This allows conversations to be restored successfully, though the secrets
will need to be re-provided since they were redacted during serialization.
Co-authored-by: openhands <[email protected]>
…tence
This enhancement builds on the previous fix (ALL-4846) by adding optional
cipher support to ConversationState and LocalConversation. When a cipher
is provided, secrets are encrypted during serialization and decrypted
during deserialization, preserving the actual secret values across
save/restore cycles.
Changes:
- ConversationState:
- Added _cipher private attribute
- Updated _save_base_state() to use cipher context for encryption
- Updated create() to accept cipher parameter and pass it to
model_validate() for decryption
- LocalConversation:
- Added cipher parameter to __init__()
- Pass cipher to ConversationState.create()
- EventService:
- Pass cipher from EventService to LocalConversation
This allows the agent server to use the same cipher for both:
1. meta.json (StoredConversation) - already supported
2. base_state.json (ConversationState) - now supported
Without a cipher, secrets are still redacted (as '**********') and the
previous fix ensures redacted secrets can be deserialized to None without
crashing.
Co-authored-by: openhands <[email protected]>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
all-hands-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall approach is solid and tests pass. The two-part fix (making secrets optional + adding cipher support) correctly addresses the serialization/deserialization bug. A few suggestions below to improve production safety and debuggability.
|
@OpenHands Please implement the suggestions left by all-hands-bot |
|
I'm on it! tofarr can track my progress at all-hands.dev |
- Add warning when saving state without cipher but with secrets - Add debug logging when headers are skipped during serialization/deserialization - Add edge case tests for cipher mismatches: - Save with cipher, load without cipher - Save without cipher, load with cipher - Save with cipher A, load with cipher B Co-authored-by: openhands <[email protected]>
|
I have successfully implemented all the suggestions from the all-hands-bot PR review. Here's a summary of the changes pushed to PR #1672: Changes Made1. Added warning when secrets exist but no cipher is provided (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
@OpenHands Read the full diff of this PR and replace ALL occurrences of “ALL-4846” with “issue 1505”. Double check the diff when you’re done. Commit and push. |
|
I'm on it! enyst can track my progress at all-hands.dev |
Co-authored-by: openhands <[email protected]>
Final summary (new work since my last summary)Double-checks performed
Changes made
Validation
Version control
|
neubig
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, LGTM just one small comment
| session_api_key = os.getenv(SESSION_API_KEY_ENV) | ||
| if session_api_key: | ||
| return SecretStr(session_api_key) | ||
| session_api_key = os.getenv("OH_SESSION_API_KEYS_0") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't understand the necessity for this, could you at least leave a comment here to explain why this two part solution is necessary?
Also,
session_api_key = os.getenv(SESSION_API_KEY_ENV) or os.getenv("OH_SESSION_API_KEYS_0")
may be more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’d also love to know why are more than 1 needed? …KEY_0 suggests than more the one is used
(While I can’t find it right now, I’ve seen that people have documented a variable OH_SESSION_KEY I believe, for LLM key encryption. I wonder if such key still works?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having more than one seasion_api_key allowed here allows key rotation. The first one in the list is the default, but others are permitted keys. This is not a workflow we use at present, but will likely become necessary as we start to move into an architecture where sandboxes contain multiple conversations, persist longer and are tied to a user/org.
The actual change here was to support the key format we already use in the docker container in the OpenHands repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The single SESSION_API_KEY is still supported for backwards compatibility - this has not changed.
I think I need to add a readme in a separate PR describing all this
Summary
This PR fixes an issue where V1 conversations fail to restore from persistence when the conversation state contains secrets in the
secret_registry.Problem
Secrets were serialized as redacted values (
**********) butStaticSecret.valuewas a required field that couldn't acceptNoneafter validation converted**********toNone.Solution (Two-part fix)
Part 1: Make secrets optional (prevents crash)
StaticSecret.valueoptional withNonedefaultStaticSecret.get_value()to handleNonevalues gracefullyLookupSecretheader validators to skip redacted headers instead of crashingPart 2: Add cipher support (preserves secrets)
cipherparameter toConversationState.create()andLocalConversation.__init__()EventServicenow passes its cipher toLocalConversationHow it works now
With cipher (recommended for production):
Without cipher (fallback):
**********during serializationStaticSecret(value=None)Files Changed
openhands-sdk/openhands/sdk/secret/secrets.py- Make StaticSecret.value optional, fix LookupSecret headersopenhands-sdk/openhands/sdk/conversation/state.py- Add cipher support for encryption/decryptionopenhands-sdk/openhands/sdk/conversation/impl/local_conversation.py- Add cipher parameteropenhands-agent-server/openhands/agent_server/event_service.py- Pass cipher to LocalConversationtests/sdk/conversation/local/test_state_serialization.py- Add regression teststests/sdk/conversation/test_secret_source.py- Add regression testsChecklist
@tofarr can click here to continue refining the PR
Fixes
Testing
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:6ca6a99-pythonRun
All tags pushed for this build
About Multi-Architecture Support
6ca6a99-python) is a multi-arch manifest supporting both amd64 and arm646ca6a99-python-amd64) are also available if needed