Skip to content

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Jan 8, 2026

Summary

Adds integration-style tests for the condenser system.

Unit tests don't usually capture the oddities of the condenser system -- it requires stressing real APIs and possibly long(ish)-running conversations. The integration test system is perfect for this.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:80740c8-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-80740c8-python \
  ghcr.io/openhands/agent-server:80740c8-python

All tags pushed for this build

ghcr.io/openhands/agent-server:80740c8-golang-amd64
ghcr.io/openhands/agent-server:80740c8-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:80740c8-golang-arm64
ghcr.io/openhands/agent-server:80740c8-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:80740c8-java-amd64
ghcr.io/openhands/agent-server:80740c8-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:80740c8-java-arm64
ghcr.io/openhands/agent-server:80740c8-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:80740c8-python-amd64
ghcr.io/openhands/agent-server:80740c8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:80740c8-python-arm64
ghcr.io/openhands/agent-server:80740c8-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:80740c8-golang
ghcr.io/openhands/agent-server:80740c8-java
ghcr.io/openhands/agent-server:80740c8-python

About Multi-Architecture Support

  • Each variant tag (e.g., 80740c8-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 80740c8-python-amd64) are also available if needed

@csmith49 csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026
@csmith49 csmith49 removed the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026
@csmith49 csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 9, 2026
@openhands-ai
Copy link

openhands-ai bot commented Jan 9, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1652 at branch `feat/condenser-integration-tests`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

openhands-agent and others added 3 commits January 9, 2026 17:18
The consolidate-results job in condenser-runner.yml was missing the
repository parameter in its checkout step. For pull_request_target
events from fork PRs, this would cause checkout to fail because it
would try to checkout the PR's commit SHA from the base repository
where that commit doesn't exist.

This fix adds the repository parameter to ensure the correct repository
is checked out for both same-repo and fork PRs.
…ort format

- Remove push trigger for feat/condenser-integration-tests branch
- Remove github.event_name == 'push' from job conditions
- Update report title to 'Condenser Tests Results'
- Simplify summary table by removing Integration/Behavior columns
- Remove required marker from failed tests output

Co-authored-by: openhands <[email protected]>
@csmith49 csmith49 marked this pull request as ready for review January 9, 2026 17:27

This test validates that Claude Opus's thinking blocks are properly handled
during conversation condensation, preventing malformed signature errors that
can occur when thinking blocks are included in conversation history.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test run for Claude Opus, or for all LLMs in the matrix?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The matrix defined for these tests is smaller -- just GPT 5.1 and Opus 4.5 -- so the logic to skip is simpler. It just runs for Opus right now.

We don't have the comments showing up yet (that requires the workflow be on the main branch), but looking at the consolidated reports generated in the workflows show that test only runs for Opus.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good! This one surely seems LLM-specific, and despite some attempts in the past, we just couldn't generalize this thinking / thinking_blocks (and neither does litellm)

Copy link
Collaborator

@enyst enyst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe we are going to see it when we actually merge it, right?

LGTM, and can't wait to see this because it unlocks better understanding of the condensation behavior with tricky reasoning LLM restrictions, and in general, really. Thank you!

@csmith49
Copy link
Collaborator Author

I think maybe we are going to see it when we actually merge it, right?

I think so! Can't actually trigger workflows using labels until they're on main, so I've been hacking this workflow to trigger on pushes to this branch. And apparently when that happens the hook doesn't provide a PR, so the report-writing steps are skipped.

@csmith49
Copy link
Collaborator Author

LGTM, and can't wait to see this because it unlocks better understanding of the condensation behavior with tricky reasoning LLM restrictions, and in general, really. Thank you!

Thanks for the review. I'm going to do some very minor cleanup and then get this merged so I can start using it to test some other big condenser changes (#1649)

@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
TOTAL15711490668% 
report-only-changed-files is enabled. No files were changed during this commit :)

@csmith49 csmith49 merged commit 3e0b8e5 into main Jan 12, 2026
21 checks passed
@csmith49 csmith49 deleted the feat/condenser-integration-tests branch January 12, 2026 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

condenser-test Triggers a run of all condenser integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants