Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Dec 29, 2025

Summary

This PR removes all reconciliation methods (resolve_diff_from_deserialized) and uses the provided Agent directly when restoring conversations. This is an alternative approach to issue #1451.

What Was Happening on Main

On main, when restoring a conversation, resolve_diff_from_deserialized would:

Override from runtime:

  • agent_context (skills, system_message_suffix, user_message_suffix, secrets)
  • llm secrets (api_key, aws credentials, litellm_extra_body)
  • condenser.llm secrets

Restore from persistence (and require exact match with runtime):

  • tools
  • mcp_config
  • filter_tools_regex
  • system_prompt_filename
  • security_policy_filename
  • system_prompt_kwargs
  • condenser (except its llm secrets)
  • llm config (model, temperature, etc.)

The final equality check meant users effectively couldn't change most Agent configuration between sessions.

What This PR Does

Removes reconciliation. The provided Agent is used directly - subject to limitations that would otherwise not work at all, such as, it has to be the same Agent class, or the same tools.

Users are now free to change Agent configuration between sessions:

  • llm (model, api_key, all settings)
  • mcp_config
  • filter_tools_regex
  • agent_context
  • system_prompt_filename
  • security_policy_filename
  • system_prompt_kwargs
  • condenser

Limitations:

  • tools
  • Agent's class/type
  • non-Agent state attributes like Confirmation Policy

Execution Flow

New Conversation:

  1. Create ConversationState with the provided Agent (Pydantic validation happens here)
  2. Initialize EventLog for event storage
  3. Save initial base state to persistence
  4. Return the new state

Restored Conversation:

  1. Load persisted base_state.json (to get conversation metadata)
  2. Verify conversation ID matches
  3. Create ConversationState with the provided Agent (Pydantic validation happens here)
  4. Restore persisted conversation metadata (execution_status, confirmation_policy, etc.)
  5. Attach EventLog to load persisted events
  6. Save updated base state (with the provided Agent)
  7. Return the resumed state

Validation

Pydantic validation happens when creating instances (LLM, Agent, ConversationState) via the constructor.

Note on Tools

From issue #1533 , tools already used in the conversation history are still available.

Scope: LocalConversation Only

This PR only affects LocalConversation.

For RemoteConversation, the server always creates the Agent from the persisted meta.json - the client's Agent is ignored when restoring. Making RemoteConversation support Agent changes would require:

  1. Client sends new Agent config when attaching to existing conversation
  2. Server accepts and uses the new Agent config instead of persisted one

This is out of scope for this PR but could be a follow-up.

Closes #1451

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:521d39f-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-521d39f-python \
  ghcr.io/openhands/agent-server:521d39f-python

All tags pushed for this build

ghcr.io/openhands/agent-server:521d39f-golang-amd64
ghcr.io/openhands/agent-server:521d39f-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:521d39f-golang-arm64
ghcr.io/openhands/agent-server:521d39f-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:521d39f-java-amd64
ghcr.io/openhands/agent-server:521d39f-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:521d39f-java-arm64
ghcr.io/openhands/agent-server:521d39f-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:521d39f-python-amd64
ghcr.io/openhands/agent-server:521d39f-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:521d39f-python-arm64
ghcr.io/openhands/agent-server:521d39f-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:521d39f-golang
ghcr.io/openhands/agent-server:521d39f-java
ghcr.io/openhands/agent-server:521d39f-python

About Multi-Architecture Support

  • Each variant tag (e.g., 521d39f-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 521d39f-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Dec 29, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/agent
   base.py1692485%164, 170, 238–239, 250–252, 265, 273–274, 368–369, 371–375, 377–378, 389, 426–427, 437–438
openhands-sdk/openhands/sdk/conversation
   state.py1592286%225, 255, 300–302, 318–319, 325, 331–334, 338, 344–347, 374, 392, 401, 416, 422
openhands-sdk/openhands/sdk/llm
   llm.py40215361%344, 349, 353, 357–358, 361, 365–366, 377–378, 380–381, 385, 402, 420–423, 500–502, 523, 527, 542, 548–549, 573–574, 584, 609–614, 635–636, 639, 643, 655, 660–663, 672, 680–687, 691–694, 696, 709, 713–714, 716–717, 722–723, 725, 732, 735–740, 797–802, 859–860, 863–866, 908, 925, 979, 982, 985–993, 997–999, 1002, 1005–1007, 1014–1015, 1024, 1031–1033, 1037, 1039–1044, 1046–1063, 1066–1070, 1072–1073, 1079–1088
TOTAL14550691252% 

@enyst enyst force-pushed the openhands/remove-reconciliation-methods branch from 76b5add to 5b3198d Compare December 29, 2025 19:19
Remove all reconciliation methods (resolve_diff_from_deserialized) and
use the runtime agent directly when restoring conversations.

Key changes:
- LLM: Remove resolve_diff_from_deserialized method entirely
- AgentBase: Remove resolve_diff_from_deserialized method entirely
- ConversationState.create(): Use runtime agent directly, no compatibility
  checking. User is free to change LLM, tools, condenser, agent_context,
  etc. between sessions.

Execution flow for new conversation:
1. Create ConversationState with runtime agent
   (Pydantic validation happens here)
2. Initialize EventLog for event storage
3. Save initial base state to persistence
4. Return the new state

Execution flow for restored conversation:
1. Load persisted base_state.json (only to get conversation metadata)
2. Verify conversation ID matches
3. Create ConversationState with the runtime agent
   (Pydantic validation happens here - runtime agent is always used)
4. Restore persisted conversation metadata (execution_status, etc.)
5. Attach EventLog to load persisted events
6. Save updated base state (with runtime agent)
7. Return the resumed state

NOTE: There's a case for checking that tools already used in the
conversation history are still available - see issue #1533.

Closes #1451

Co-authored-by: openhands <[email protected]>
Reintroduce tools restriction from the original reconcile method:
- Add AgentBase.load(persisted) method that validates tools match
- Tools must match between runtime and persisted agents (they may have
  been used in conversation history)
- All other config (LLM, agent_context, condenser, etc.) can change freely

Update ConversationState.create() to use agent.load() on restore path.

Co-authored-by: openhands <[email protected]>
Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Just need to fix a few minor things!

enyst and others added 4 commits December 31, 2025 17:48
Address review comments:
- Rename AgentBase.load() to AgentBase.verify() since it's a verification
  method, not a load method
- Update docstring to say 'Verify that we can resume...'
- Capture return value: verified_agent = agent.verify(persisted_state.agent)
- Update tests to use verify() instead of load()

Co-authored-by: openhands <[email protected]>
Address xingyaoww's review comment: instead of creating state from scratch,
load persisted state and update specific fields. This is more future-proof -
new fields will automatically be preserved.

Co-authored-by: openhands <[email protected]>
Load persisted state but override with runtime-provided values:
- agent (verified against persisted)
- workspace
- max_iterations
- stuck_detection

Keep from persisted state:
- id, persistence_dir, execution_status, confirmation_policy
- activated_knowledge_skills, blocked_actions, blocked_messages
- secret_registry

This gives the best of both approaches: future-proof for new fields
while respecting user-provided runtime configuration.

Co-authored-by: openhands <[email protected]>
@OpenHands OpenHands deleted a comment from openhands-ai bot Dec 31, 2025
Test coverage for:
- Runtime values used on resume: workspace, max_iterations
- Persisted values preserved: execution_status, stuck_detection,
  blocked_actions, blocked_messages
- Stats reset on resume (fresh session)
- Conversation ID mismatch raises error

Co-authored-by: openhands <[email protected]>
@enyst enyst marked this pull request as ready for review December 31, 2025 18:47
@enyst
Copy link
Collaborator Author

enyst commented Dec 31, 2025

@OpenHands I look at the diff of this PR and I don't see all I expected to see. For example, in llm.py we have a class var named OVERRIDE... something, which is, I think, only used in the reconciliation method we removed from LLM. Please track it down and confirm what I said, and if it's unused let's clean it up.

Verify if in the other files in this PR, e.g. state or agentbase, we have something similar and do the same.

Review the code and clean it up from such redundancies or similar.

@openhands-ai
Copy link

openhands-ai bot commented Dec 31, 2025

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Dec 31, 2025

Final summary (new since last summary)

Double-check: request coverage

  • Tracked down LLM.OVERRIDE_ON_SERIALIZE and confirmed it was only used by the removed reconciliation method (LLM.resolve_diff_from_deserialized).
  • Confirmed it was unused after the refactor (no other references in the repo).
  • Searched for similar reconciliation/override leftovers in the other PR-touched files mentioned (agent/base.py, conversation/state.py) and found no comparable unused constants/fields to remove.
  • Removed the redundancy (the now-unused class var + related comments) from llm.py.
  • Ran pre-commit on the edited file and it passed.

Conciseness check

  • Changes are minimal and directly address the redundancy: 13 lines deleted in one file.
  • No unrelated formatting or refactors.

Git / delivery

  • Commit created with the required co-author line.
  • Changes pushed to the existing branch (openhands/remove-reconciliation-methods) to update the PR.

Diff recap (this update only):

  • openhands-sdk/openhands/sdk/llm/llm.py: removed unused OVERRIDE_ON_SERIALIZE class var block (13 deletions).

View full conversation

@enyst enyst added the integration-test Runs the integration tests and comments the results label Jan 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

🧪 Integration Tests Results

Overall Success Rate: 87.5%
Total Cost: $2.07
Models Tested: 6
Timestamp: 2026-01-06 13:00:50 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5.1_codex_max 88.9% 88.9% N/A 8/9 1 10 $0.15 275,647
litellm_proxy_vertex_ai_gemini_3_pro_preview 90.0% 90.0% N/A 9/10 0 10 $0.55 329,797
litellm_proxy_mistral_devstral_2512 77.8% 77.8% N/A 7/9 1 10 $0.22 537,515
litellm_proxy_moonshot_kimi_k2_thinking 88.9% 88.9% N/A 8/9 1 10 $0.43 650,161
litellm_proxy_deepseek_deepseek_chat 88.9% 88.9% N/A 8/9 1 10 $0.06 613,346
litellm_proxy_claude_sonnet_4_5_20250929 90.0% 90.0% N/A 9/10 0 10 $0.65 552,140

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.15
  • Token Usage: prompt: 272,005, completion: 3,642, cache_read: 199,808, reasoning: 1,536
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_8e12188_gpt51_codex_run_N10_20260106_124718
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.0016)

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 90.0% (9/10)
  • Integration Tests (Required): 90.0% (9/10)
  • Total Cost: $0.55
  • Token Usage: prompt: 311,194, completion: 18,603, cache_read: 165,958, reasoning: 14,712
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_8e12188_gemini_3_pro_run_N10_20260106_124719

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.02)

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 77.8% (7/9)
  • Integration Tests (Required): 77.8% (7/10)
  • Total Cost: $0.22
  • Token Usage: prompt: 532,261, completion: 5,254
  • Run Suffix: litellm_proxy_mistral_devstral_2512_8e12188_devstral_2512_run_N10_20260106_124719
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.02)
  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.01)

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.43
  • Token Usage: prompt: 628,665, completion: 21,496, cache_read: 548,352
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_8e12188_kimi_k2_run_N10_20260106_124720
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.09)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.06
  • Token Usage: prompt: 600,543, completion: 12,803, cache_read: 563,584
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_8e12188_deepseek_run_N10_20260106_124712
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.0059)

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 90.0% (9/10)
  • Integration Tests (Required): 90.0% (9/10)
  • Total Cost: $0.65
  • Token Usage: prompt: 539,005, completion: 13,135, cache_read: 453,159, cache_write: 84,927, reasoning: 3,951
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_8e12188_sonnet_run_N10_20260106_124720

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.0057)

@enyst enyst added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Jan 6, 2026
@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 6, 2026

🧪 Integration Tests Results

Overall Success Rate: 85.7%
Total Cost: $3.53
Models Tested: 6
Timestamp: 2026-01-06 13:40:32 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Overall Integration (Required) Behavior (Optional) Tests Passed Skipped Total Cost Tokens
litellm_proxy_vertex_ai_gemini_3_pro_preview 90.0% 90.0% N/A 9/10 0 10 $0.58 310,459
litellm_proxy_claude_sonnet_4_5_20250929 90.0% 90.0% N/A 9/10 0 10 $0.68 594,933
litellm_proxy_deepseek_deepseek_chat 88.9% 88.9% N/A 8/9 1 10 $0.07 709,920
litellm_proxy_gpt_5.1_codex_max 88.9% 88.9% N/A 8/9 1 10 $1.50 6,134,922
litellm_proxy_moonshot_kimi_k2_thinking 88.9% 88.9% N/A 8/9 1 10 $0.55 858,035
litellm_proxy_mistral_devstral_2512 66.7% 66.7% N/A 6/9 1 10 $0.14 327,177

📋 Detailed Results

litellm_proxy_vertex_ai_gemini_3_pro_preview

  • Overall Success Rate: 90.0% (9/10)
  • Integration Tests (Required): 90.0% (9/10)
  • Total Cost: $0.58
  • Token Usage: prompt: 287,015, completion: 23,444, cache_read: 150,647, reasoning: 18,043
  • Run Suffix: litellm_proxy_vertex_ai_gemini_3_pro_preview_a2aa1b9_gemini_3_pro_run_N10_20260106_131802

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.02)

litellm_proxy_claude_sonnet_4_5_20250929

  • Overall Success Rate: 90.0% (9/10)
  • Integration Tests (Required): 90.0% (9/10)
  • Total Cost: $0.68
  • Token Usage: prompt: 582,592, completion: 12,341, cache_read: 489,125, cache_write: 92,523, reasoning: 3,644
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_a2aa1b9_sonnet_run_N10_20260106_131803

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.02)

litellm_proxy_deepseek_deepseek_chat

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.07
  • Token Usage: prompt: 694,574, completion: 15,346, cache_read: 655,296
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_a2aa1b9_deepseek_run_N10_20260106_131802
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.01)

litellm_proxy_gpt_5.1_codex_max

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $1.50
  • Token Usage: prompt: 6,091,116, completion: 43,806, cache_read: 5,821,824, reasoning: 34,176
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_a2aa1b9_gpt51_codex_run_N10_20260106_131801
  • Skipped Tests: 1

Skipped Tests:

  • t09_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $1.35)

litellm_proxy_moonshot_kimi_k2_thinking

  • Overall Success Rate: 88.9% (8/9)
  • Integration Tests (Required): 88.9% (8/10)
  • Total Cost: $0.55
  • Token Usage: prompt: 839,295, completion: 18,740, cache_read: 759,552
  • Run Suffix: litellm_proxy_moonshot_kimi_k2_thinking_a2aa1b9_kimi_k2_run_N10_20260106_131801
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.0028)

litellm_proxy_mistral_devstral_2512

  • Overall Success Rate: 66.7% (6/9)
  • Integration Tests (Required): 66.7% (6/10)
  • Total Cost: $0.14
  • Token Usage: prompt: 324,339, completion: 2,838
  • Run Suffix: litellm_proxy_mistral_devstral_2512_a2aa1b9_devstral_2512_run_N10_20260106_131801
  • Skipped Tests: 1

Skipped Tests:

  • t08_image_file_viewing: This test requires a vision-capable LLM model. Please use a model that supports image input.

Failed Tests:

  • t09_token_condenser ⚠️ REQUIRED: Condensation not triggered. Token counting may not work. (Cost: $0.0019)
  • t10_restore_conversation ⚠️ REQUIRED: RESTORE_LLM_CONFIG_2 is required for t10_restore_conversation (Cost: $0.0016)
  • t02_add_bash_hello ⚠️ REQUIRED: Shell script is not executable (Cost: $0.0084)

@enyst
Copy link
Collaborator Author

enyst commented Jan 6, 2026

@OpenHands Understand this PR. Then split the integration test changes from it, in a new branch based on this branch, and make a PR to this PR.

We want to get those right, but without running it with the whole LLM matrix, and in the same time, we want this branch ready.

@openhands-ai
Copy link

openhands-ai bot commented Jan 6, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 6, 2026

Final summary (new since last summary)

✅ Verification

  • Working tree is clean on openhands/integration-tests-only (git status shows nothing to commit).
  • No local diff (git diff empty).
  • Branch is pushed and tracking origin/openhands/integration-tests-only.

✅ What was delivered (per your request)

  • Split the integration test changes out of openhands/remove-reconciliation-methods into a separate branch (openhands/integration-tests-only) so they can be reviewed/iterated independently of the larger refactor.
  • Opened a PR targeting the original PR branch (“PR to this PR”), as requested:

🔍 Scope of the split (integration-only files)

This split PR contains only:

  • .github/workflows/integration-runner.yml
  • tests/integration/run_infer.py
  • tests/integration/tests/t10_restore_conversation.py

No other refactor/reconciliation-removal code is included.

View full conversation

@enyst
Copy link
Collaborator Author

enyst commented Jan 6, 2026

@enyst hmm, i would hope to have two real llms :), but i do agree with you that we might not need to run it for all the model matrix we have.

Maybe for the context of getting this PR merged, we can do 3 for now? I'm hoping i can include this PR in the next minor release :)

@xingyaoww

  • I adjusted cross tests to do more, test also another detail. I do think the code should be ready for review.
  • I also tried running separately "2 LLMs" option in integration tests, but that didn't work, it requires more testing. I split it into its own branch, in case we want to merge this one.
  • To note, the original integration test with 2 LLMs worked well. It was restoring for real and continued conversation with another event. Need to figure out how to run it in github actions correctly without the matrix.
  • I'm meanwhile trying to test manually with a CLI based on this branch. 🤔

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it! Really appreciate the effort here @enyst - this will make it much easier to implement the model switching on the fly!

@xingyaoww
Copy link
Collaborator

@OpenHands please create an issue in OpenHands/OpenHands-CLI - based on the change of this PR, we should update how we handle agent_settings systematically in CLI.

We should only keep the non-default fields in agent_settings for CLI (instead of saving every field like .model_dump() and always override them - and keep everything else as default.

@openhands-ai
Copy link

openhands-ai bot commented Jan 6, 2026

I'm on it! xingyaoww can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 6, 2026

I've successfully created an issue in the OpenHands/OpenHands-CLI repository based on the changes in PR #1542.

Issue Created: #302 - Update agent_settings handling to only persist non-default fields

The issue describes:

  • Current Behavior: CLI saves all fields from agent_settings using .model_dump() and always overrides them when restoring conversations
  • Proposed Change: Only keep non-default fields in agent_settings for CLI, using approaches like model_dump(exclude_defaults=True)
  • Benefits: Smaller config files, better forward compatibility, cleaner semantics, and alignment with the SDK's new approach
  • Implementation Notes: Merge saved non-default fields with current defaults when loading

The issue is linked to the SDK PR #1542 for context.

View full conversation

@enyst
Copy link
Collaborator Author

enyst commented Jan 6, 2026

This is the CLI on this branch, restoring with another LLM, remembering the events, and it did also delete that file:
image

@enyst enyst merged commit a918f39 into main Jan 6, 2026
21 checks passed
@enyst enyst deleted the openhands/remove-reconciliation-methods branch January 6, 2026 15:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposal: agent as composed of immutable instances

3 participants