Condenser integration tests #1652

csmith49 · 2026-01-08T20:05:59Z

Summary

Adds integration-style tests for the condenser system.

Unit tests don't usually capture the oddities of the condenser system -- it requires stressing real APIs and possibly long(ish)-running conversations. The integration test system is perfect for this.

Adds a condenser test type along the integration and behavior tests
Adds an initial test (from Integration test for Opus thinking block constraints #1584)
Adds a runner so that we can trigger these tests from PRs

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:80740c8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-80740c8-python \
  ghcr.io/openhands/agent-server:80740c8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:80740c8-golang-amd64
ghcr.io/openhands/agent-server:80740c8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:80740c8-golang-arm64
ghcr.io/openhands/agent-server:80740c8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:80740c8-java-amd64
ghcr.io/openhands/agent-server:80740c8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:80740c8-java-arm64
ghcr.io/openhands/agent-server:80740c8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:80740c8-python-amd64
ghcr.io/openhands/agent-server:80740c8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:80740c8-python-arm64
ghcr.io/openhands/agent-server:80740c8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:80740c8-golang
ghcr.io/openhands/agent-server:80740c8-java
ghcr.io/openhands/agent-server:80740c8-python

About Multi-Architecture Support

Each variant tag (e.g., 80740c8-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 80740c8-python-amd64) are also available if needed

openhands-ai · 2026-01-09T17:00:06Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1652 at branch `feat/condenser-integration-tests`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

The consolidate-results job in condenser-runner.yml was missing the repository parameter in its checkout step. For pull_request_target events from fork PRs, this would cause checkout to fail because it would try to checkout the PR's commit SHA from the base repository where that commit doesn't exist. This fix adds the repository parameter to ensure the correct repository is checked out for both same-repo and fork PRs.

…kout" This reverts commit 4a4de2c.

…ort format - Remove push trigger for feat/condenser-integration-tests branch - Remove github.event_name == 'push' from job conditions - Update report title to 'Condenser Tests Results' - Simplify summary table by removing Integration/Behavior columns - Remove required marker from failed tests output Co-authored-by: openhands <[email protected]>

enyst · 2026-01-09T18:42:03Z

tests/integration/tests/c01_thinking_block_condenser.py

+
+This test validates that Claude Opus's thinking blocks are properly handled
+during conversation condensation, preventing malformed signature errors that
+can occur when thinking blocks are included in conversation history.


Does this test run for Claude Opus, or for all LLMs in the matrix?

The matrix defined for these tests is smaller -- just GPT 5.1 and Opus 4.5 -- so the logic to skip is simpler. It just runs for Opus right now.

We don't have the comments showing up yet (that requires the workflow be on the main branch), but looking at the consolidated reports generated in the workflows show that test only runs for Opus.

Sounds good! This one surely seems LLM-specific, and despite some attempts in the past, we just couldn't generalize this thinking / thinking_blocks (and neither does litellm)

enyst

I think maybe we are going to see it when we actually merge it, right?

LGTM, and can't wait to see this because it unlocks better understanding of the condensation behavior with tricky reasoning LLM restrictions, and in general, really. Thank you!

csmith49 · 2026-01-12T18:25:31Z

I think maybe we are going to see it when we actually merge it, right?

I think so! Can't actually trigger workflows using labels until they're on main, so I've been hacking this workflow to trigger on pushes to this branch. And apparently when that happens the hook doesn't provide a PR, so the report-writing steps are skipped.

csmith49 · 2026-01-12T18:29:12Z

LGTM, and can't wait to see this because it unlocks better understanding of the condensation behavior with tricky reasoning LLM restrictions, and in general, really. Thank you!

Thanks for the review. I'm going to do some very minor cleanup and then get this merged so I can start using it to test some other big condenser changes (#1649)

github-actions · 2026-01-12T19:53:59Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	15711	4906	68%

report-only-changed-files is enabled. No files were changed during this commit :)

Calvin Smith and others added 17 commits January 8, 2026 13:01

condenser test type

7c8d305

thinking block test and readme updates

e411af6

condenser workflow

33fca10

Merge branch 'main' into feat/condenser-integration-tests

b3480ec

minor linting fixes

5b750e1

fixing hallucinated code

68cdb50

moving token condenser test from integration

d3dae88

test hard/soft requirements

07bbc3e

size condenser

0790de3

renaming to have token and size tests next to each other

bc1dc72

minor fixes

82fc956

fixing availability test

fe36efb

splitting tests and renumbering

79dfbdb

fixing tests

ac5e060

gpt-4o tests

2549882

linting

a446214

Merge branch 'main' into feat/condenser-integration-tests

19dc44c

csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026

Merge branch 'main' into feat/condenser-integration-tests

417edf7

csmith49 removed the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026

temporary push trigger for workflow

06304c4

csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026

csmith49 and others added 4 commits January 9, 2026 09:41

Merge branch 'main' into feat/condenser-integration-tests

926d790

workflow temp changes for testing

ad4615c

model name change

85a9b63

one more model name change

4877f7f

openhands-agent and others added 3 commits January 9, 2026 17:18

Revert "fix(ci): add repository parameter to consolidate-results chec…

155c87b

…kout" This reverts commit 4a4de2c.

linting

74b1ebb

csmith49 marked this pull request as ready for review January 9, 2026 17:27

enyst reviewed Jan 9, 2026

View reviewed changes

Merge branch 'main' into feat/condenser-integration-tests

2e7aba2

enyst approved these changes Jan 12, 2026

View reviewed changes

Calvin Smith and others added 5 commits January 12, 2026 12:35

removing unused setup and duplicated vars

e5e4289

skip based on model patterns

f34d804

llm copy util

2f95cdb

linting

d22c118

Merge branch 'main' into feat/condenser-integration-tests

78e56d7

csmith49 merged commit 3e0b8e5 into main Jan 12, 2026
21 checks passed

csmith49 deleted the feat/condenser-integration-tests branch January 12, 2026 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Condenser integration tests #1652

Condenser integration tests #1652

Uh oh!

csmith49 commented Jan 8, 2026 •

edited by github-actions bot

Loading

Uh oh!

openhands-ai bot commented Jan 9, 2026

Uh oh!

enyst Jan 9, 2026

Uh oh!

csmith49 Jan 9, 2026

Uh oh!

enyst Jan 9, 2026

Uh oh!

enyst left a comment

Uh oh!

csmith49 commented Jan 12, 2026

Uh oh!

csmith49 commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Condenser integration tests #1652

Condenser integration tests #1652

Uh oh!

Conversation

csmith49 commented Jan 8, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

openhands-ai bot commented Jan 9, 2026

Uh oh!

enyst Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

csmith49 Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

enyst Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

enyst left a comment

Choose a reason for hiding this comment

Uh oh!

csmith49 commented Jan 12, 2026

Uh oh!

csmith49 commented Jan 12, 2026

Uh oh!

github-actions bot commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

csmith49 commented Jan 8, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 12, 2026 •

edited

Loading