Skip to content

Conversation

@tofarr
Copy link
Collaborator

@tofarr tofarr commented Jan 9, 2026

Summary

This PR fixes an issue where V1 conversations fail to restore from persistence when the conversation state contains secrets in the secret_registry.

Problem

Secrets were serialized as redacted values (**********) but StaticSecret.value was a required field that couldn't accept None after validation converted ********** to None.

Solution (Two-part fix)

Part 1: Make secrets optional (prevents crash)

  • Made StaticSecret.value optional with None default
  • Updated StaticSecret.get_value() to handle None values gracefully
  • Fixed LookupSecret header validators to skip redacted headers instead of crashing

Part 2: Add cipher support (preserves secrets)

  • Added cipher parameter to ConversationState.create() and LocalConversation.__init__()
  • When a cipher is provided, secrets are encrypted on save and decrypted on load
  • EventService now passes its cipher to LocalConversation

How it works now

  1. With cipher (recommended for production):

    • Secrets are encrypted when saving state
    • Secrets are decrypted when loading state
    • Secret values are preserved across restarts
  2. Without cipher (fallback):

    • Secrets are redacted as ********** during serialization
    • Redacted secrets deserialize to StaticSecret(value=None)
    • Conversation loads successfully, but secrets need to be re-provided

Files Changed

  • openhands-sdk/openhands/sdk/secret/secrets.py - Make StaticSecret.value optional, fix LookupSecret headers
  • openhands-sdk/openhands/sdk/conversation/state.py - Add cipher support for encryption/decryption
  • openhands-sdk/openhands/sdk/conversation/impl/local_conversation.py - Add cipher parameter
  • openhands-agent-server/openhands/agent_server/event_service.py - Pass cipher to LocalConversation
  • tests/sdk/conversation/local/test_state_serialization.py - Add regression tests
  • tests/sdk/conversation/test_secret_source.py - Add regression tests

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

@tofarr can click here to continue refining the PR

Fixes

Testing

image image image image image

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:6ca6a99-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-6ca6a99-python \
  ghcr.io/openhands/agent-server:6ca6a99-python

All tags pushed for this build

ghcr.io/openhands/agent-server:6ca6a99-golang-amd64
ghcr.io/openhands/agent-server:6ca6a99-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:6ca6a99-golang-arm64
ghcr.io/openhands/agent-server:6ca6a99-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:6ca6a99-java-amd64
ghcr.io/openhands/agent-server:6ca6a99-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:6ca6a99-java-arm64
ghcr.io/openhands/agent-server:6ca6a99-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:6ca6a99-python-amd64
ghcr.io/openhands/agent-server:6ca6a99-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:6ca6a99-python-arm64
ghcr.io/openhands/agent-server:6ca6a99-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:6ca6a99-golang
ghcr.io/openhands/agent-server:6ca6a99-java
ghcr.io/openhands/agent-server:6ca6a99-python

About Multi-Architecture Support

  • Each variant tag (e.g., 6ca6a99-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 6ca6a99-python-amd64) are also available if needed

…ization mismatch (ALL-4846)

## Problem
V1 conversations fail to restore from persistence with a Pydantic validation
error when the conversation state contains secrets in the secret_registry.
The error occurs because secrets are serialized as redacted values ('**********')
but the StaticSecret model requires a non-null 'value' field.

## Root Cause
1. Secrets are serialized without cipher or expose_secrets context, resulting
   in '**********' (redacted) values
2. During deserialization, validate_secret() converts '**********' to None
3. StaticSecret.value was a required field (not Optional), causing validation
   to fail when None was returned

Similar issue existed in LookupSecret where assertions would fail for redacted
secret headers.

## Fix
1. Made StaticSecret.value optional with None default:
   `value: SecretStr | None = None`
2. Updated StaticSecret.get_value() to handle None values
3. Updated LookupSecret header validators to skip redacted headers instead
   of asserting (which would crash)

This allows conversations to be restored successfully, though the secrets
will need to be re-provided since they were redacted during serialization.

Co-authored-by: openhands <[email protected]>
…tence

This enhancement builds on the previous fix (ALL-4846) by adding optional
cipher support to ConversationState and LocalConversation. When a cipher
is provided, secrets are encrypted during serialization and decrypted
during deserialization, preserving the actual secret values across
save/restore cycles.

Changes:
- ConversationState:
  - Added _cipher private attribute
  - Updated _save_base_state() to use cipher context for encryption
  - Updated create() to accept cipher parameter and pass it to
    model_validate() for decryption

- LocalConversation:
  - Added cipher parameter to __init__()
  - Pass cipher to ConversationState.create()

- EventService:
  - Pass cipher from EventService to LocalConversation

This allows the agent server to use the same cipher for both:
1. meta.json (StoredConversation) - already supported
2. base_state.json (ConversationState) - now supported

Without a cipher, secrets are still redacted (as '**********') and the
previous fix ensures redacted secrets can be deserialized to None without
crashing.

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Jan 9, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-agent-server/openhands/agent_server
   config.py60493%23, 30, 33, 157
   event_service.py31415849%55–56, 75–77, 81–86, 89–92, 107, 123, 127, 131–132, 139, 141, 148–149, 157–160, 167–169, 186, 210–211, 214–215, 217–219, 221, 226, 229–230, 233–235, 238, 242–244, 246, 248, 259–262, 275–276, 279–280, 283, 286–288, 291–292, 295–296, 300, 303, 307, 311–312, 314, 331–332, 349, 351, 355–357, 361, 370–371, 373, 377, 383, 385, 393–398, 448, 450–453, 462, 478, 485, 489, 500–501, 511–514, 516–517, 521, 523, 527–530, 535–537, 539, 543–546, 550–553, 561–564, 583–584, 586–593, 595–596, 605–606, 608–609, 616–617, 619–620, 624, 630, 640–641, 648
openhands-sdk/openhands/sdk/conversation
   state.py173696%144, 283, 329–331, 445
openhands-sdk/openhands/sdk/conversation/impl
   local_conversation.py2601893%251, 256, 298–299, 303–304, 384–385, 388, 534, 542, 544, 554, 556–558, 699–700
openhands-sdk/openhands/sdk/secret
   secrets.py64296%85, 88
TOTAL14993442070% 

@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1672 at branch `fix/ALL-4846-secret-serialization-deserialization`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@tofarr tofarr marked this pull request as ready for review January 9, 2026 22:17
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall approach is solid and tests pass. The two-part fix (making secrets optional + adding cipher support) correctly addresses the serialization/deserialization bug. A few suggestions below to improve production safety and debuggability.

@tofarr
Copy link
Collaborator Author

tofarr commented Jan 9, 2026

@OpenHands Please implement the suggestions left by all-hands-bot

@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

I'm on it! tofarr can track my progress at all-hands.dev

@enyst enyst changed the title Fix conversation restore failing due to secret serialization/deserialization mismatch (ALL-4846) Fix conversation restore failing due to secret serialization/deserialization mismatch Jan 9, 2026
- Add warning when saving state without cipher but with secrets
- Add debug logging when headers are skipped during serialization/deserialization
- Add edge case tests for cipher mismatches:
  - Save with cipher, load without cipher
  - Save without cipher, load with cipher
  - Save with cipher A, load with cipher B

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

I have successfully implemented all the suggestions from the all-hands-bot PR review. Here's a summary of the changes pushed to PR #1672:

Changes Made

1. Added warning when secrets exist but no cipher is provided (state.py)

When saving conversation state without a cipher but with secrets in the registry, a warning is now logged:

Saving conversation state without cipher - X secret(s) will be redacted and lost on restore. Consider providing a cipher to preserve secrets.

2. Added debug logging for header handling (secrets.py)

  • Added logger import and initialization
  • Added debug logging when redacted headers are skipped during deserialization
  • Added debug logging when redacted headers are skipped during serialization

3. Added edge case tests for cipher mismatches (test_state_serialization.py)

Three new tests covering cipher edge cases:

  • test_conversation_state_save_with_cipher_load_without - Verifies that when state is saved with a cipher but loaded without one, the encrypted values remain as-is (unusable but conversation loads)
  • test_conversation_state_save_without_cipher_load_with - Verifies that when state is saved without a cipher but loaded with one, the redacted secrets deserialize to None
  • test_conversation_state_cipher_mismatch - Verifies that when state is saved with cipher A but loaded with cipher B, decryption fails gracefully (secrets set to None with warning logged)

Verification

  • All pre-commit hooks pass (ruff format, ruff lint, pycodestyle, pyright, import dependency rules)
  • All 31 related tests pass

The changes have been pushed to the branch fix/ALL-4846-secret-serialization-deserialization and are now part of PR #1672.

View full conversation

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tofarr thank you! This probably fixes #1505 ?

@enyst
Copy link
Collaborator

enyst commented Jan 10, 2026

@OpenHands Read the full diff of this PR and replace ALL occurrences of “ALL-4846” with “issue 1505”. Double check the diff when you’re done. Commit and push.

@openhands-ai
Copy link

openhands-ai bot commented Jan 10, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 10, 2026

Final summary (new work since my last summary)

Double-checks performed

  • Confirmed working tree is clean (git status)
  • Confirmed no remaining ALL-4846 in the full PR diff (git diff origin/main...HEAD | grep ALL-4846 returned nothing)
  • Ensured changes are concise and scoped to the request: only docstring text replacements in existing test files; no functional/code changes added

Changes made

  • Replaced all occurrences of ALL-4846 with issue 1505 in the PR diff (4 replacements), affecting only:
    • tests/sdk/conversation/local/test_state_serialization.py
    • tests/sdk/conversation/test_secret_source.py

Validation

  • Ran uv run pre-commit run --files ... on the modified files: passed

Version control

  • Committed only the relevant files
  • Commit message includes Co-authored-by: openhands <[email protected]>
  • Pushed updates to the existing branch fix/ALL-4846-secret-serialization-deserialization (PR updated)

View full conversation

@enyst enyst requested a review from neubig January 10, 2026 15:36
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, LGTM just one small comment

session_api_key = os.getenv(SESSION_API_KEY_ENV)
if session_api_key:
return SecretStr(session_api_key)
session_api_key = os.getenv("OH_SESSION_API_KEYS_0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't understand the necessity for this, could you at least leave a comment here to explain why this two part solution is necessary?

Also,

session_api_key = os.getenv(SESSION_API_KEY_ENV) or os.getenv("OH_SESSION_API_KEYS_0")

may be more concise.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d also love to know why are more than 1 needed? …KEY_0 suggests than more the one is used

(While I can’t find it right now, I’ve seen that people have documented a variable OH_SESSION_KEY I believe, for LLM key encryption. I wonder if such key still works?)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having more than one seasion_api_key allowed here allows key rotation. The first one in the list is the default, but others are permitted keys. This is not a workflow we use at present, but will likely become necessary as we start to move into an architecture where sandboxes contain multiple conversations, persist longer and are tied to a user/org.

The actual change here was to support the key format we already use in the docker container in the OpenHands repo

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The single SESSION_API_KEY is still supported for backwards compatibility - this has not changed.

I think I need to add a readme in a separate PR describing all this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants