Skip to content

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Jan 8, 2026

Summary

This PR is a WIP.

The goal is to refactor the View object so that all of the consistency checks -- batch structure, tool loops + thinking blocks, etc -- can be easily represented in parallel.

Why

The View inherits some behavior from v0's ConversationMemory class, which was responsible for making sure that certain properties held in the event stream before it was sent up to the LLM. Example properties:

  • Action/Observation pairs not split
  • Action batches not split
  • Tool use loops not split

We expect the LLM API to preserve these features. They'll only be violated when we start to mess with the event stream...like the condenser does.

In recent PRs the View/Condenser interaction has been changed to rely on manipulation indices, where the condenser now tries to maintain the properties as it drops events. We've kept the "enforcement" side of things as a safety plan, but ideally it never matters.

However, both of those processes are tightly-coupled (wrt the properties enforced) loops. To make it easier to manage these properties this PR works to separate the properties into classes that can be managed in parallel.

Design Decisions and Assumptions

View properties are simple classes that implement a two-function API:

  1. enforce, which checks that the property holds, and
  2. manipulation_indices, which produces a set of indices that the condenser can use while preserving the property

In theory only the manipulation indices are necessary, but experience has taught us that sometimes weird things happen. By keeping the enforcement function around and logging the output with big scary warnings we can recover from these weird things, with the option to disable enforcement or simplify the codebase in the future.

Manipulation indices

One reason this works is because the manipulation index calculations can be done independently and combined simply (just taking the intersection of the sets).

It's possible that future properties won't be representable within this framework. For example, maybe we're allowed to drop at most one thinking block from the previous message. There's no way to capture that with the intersection of manipulation indices.

That may never end up being a problem. But if so all we need to do is update the manipulation index representation and how the condensers use them. That's a big implementation lift but it still fits in this conceptual framework.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:7ae4e94-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-7ae4e94-python \
  ghcr.io/openhands/agent-server:7ae4e94-python

All tags pushed for this build

ghcr.io/openhands/agent-server:7ae4e94-golang-amd64
ghcr.io/openhands/agent-server:7ae4e94-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:7ae4e94-golang-arm64
ghcr.io/openhands/agent-server:7ae4e94-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:7ae4e94-java-amd64
ghcr.io/openhands/agent-server:7ae4e94-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:7ae4e94-java-arm64
ghcr.io/openhands/agent-server:7ae4e94-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:7ae4e94-python-amd64
ghcr.io/openhands/agent-server:7ae4e94-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:7ae4e94-python-arm64
ghcr.io/openhands/agent-server:7ae4e94-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:7ae4e94-golang
ghcr.io/openhands/agent-server:7ae4e94-java
ghcr.io/openhands/agent-server:7ae4e94-python

About Multi-Architecture Support

  • Each variant tag (e.g., 7ae4e94-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 7ae4e94-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 8, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/context/view
   manipulation_indices.py9277%31–32
   view.py109199%102
openhands-sdk/openhands/sdk/context/view/properties
   base.py35294%42, 61
   tool_loop_atomicity.py91297%83, 121
TOTAL15191444770% 

@csmith49 csmith49 mentioned this pull request Jan 12, 2026
5 tasks
@openhands-ai
Copy link

openhands-ai bot commented Jan 12, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Pre-commit checks

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1649 at branch `fix/view-cleanup`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@csmith49 csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 12, 2026
@github-actions
Copy link
Contributor

Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly.

Note: These are non-blocking tests that validate condenser functionality across different LLMs.

@github-actions
Copy link
Contributor

Condenser Test Results (Non-Blocking)

These tests validate condenser functionality and do not block PR merges.

🧪 Condenser Tests Results

Overall Success Rate: 57.1%
Total Cost: $0.80
Models Tested: 2
Timestamp: 2026-01-12 20:34:35 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_gpt_5.1_codex_max 0.0% 0/2 3 5 $0.0079 7,990
litellm_proxy_anthropic_claude_opus_4_5_20251101 80.0% 4/5 0 5 $0.79 402,091

📋 Detailed Results

litellm_proxy_gpt_5.1_codex_max

  • Success Rate: 0.0% (0/2)
  • Total Cost: $0.0079
  • Token Usage: prompt: 7,754, completion: 236, cache_read: 3,712, reasoning: 128
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_909bbf8_gpt51_condenser_run_N5_20260112_203022
  • Skipped Tests: 3

Skipped Tests:

  • c01_thinking_block_condenser: Model litellm_proxy/gpt-5.1-codex-max does not support extended thinking or reasoning effort
  • c04_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.
  • c05_size_condenser: This test stresses long repetitive tool loops to trigger size-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

Failed Tests:

  • c02_hard_condensation_requirement: Test execution failed: No manipulation index found >= 3. Available indices: [0, 1, 2] (Cost: $0.00)
  • c03_soft_condensation_requirement: Expected at least one condensation to occur during the test (Cost: $0.0079)

litellm_proxy_anthropic_claude_opus_4_5_20251101

  • Success Rate: 80.0% (4/5)
  • Total Cost: $0.79
  • Token Usage: prompt: 387,071, completion: 15,020, cache_read: 347,085, cache_write: 31,503, reasoning: 1,389
  • Run Suffix: litellm_proxy_anthropic_claude_opus_4_5_20251101_909bbf8_opus_condenser_run_N5_20260112_203050

Failed Tests:

  • c02_hard_condensation_requirement: Test execution failed: No manipulation index found >= 3. Available indices: [0, 1, 2] (Cost: $0.00)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

condenser-test Triggers a run of all condenser integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants