-
Notifications
You must be signed in to change notification settings - Fork 106
Condenser integration tests #1652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
The consolidate-results job in condenser-runner.yml was missing the repository parameter in its checkout step. For pull_request_target events from fork PRs, this would cause checkout to fail because it would try to checkout the PR's commit SHA from the base repository where that commit doesn't exist. This fix adds the repository parameter to ensure the correct repository is checked out for both same-repo and fork PRs.
…kout" This reverts commit 4a4de2c.
…ort format - Remove push trigger for feat/condenser-integration-tests branch - Remove github.event_name == 'push' from job conditions - Update report title to 'Condenser Tests Results' - Simplify summary table by removing Integration/Behavior columns - Remove required marker from failed tests output Co-authored-by: openhands <[email protected]>
|
|
||
| This test validates that Claude Opus's thinking blocks are properly handled | ||
| during conversation condensation, preventing malformed signature errors that | ||
| can occur when thinking blocks are included in conversation history. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this test run for Claude Opus, or for all LLMs in the matrix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The matrix defined for these tests is smaller -- just GPT 5.1 and Opus 4.5 -- so the logic to skip is simpler. It just runs for Opus right now.
We don't have the comments showing up yet (that requires the workflow be on the main branch), but looking at the consolidated reports generated in the workflows show that test only runs for Opus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! This one surely seems LLM-specific, and despite some attempts in the past, we just couldn't generalize this thinking / thinking_blocks (and neither does litellm)
enyst
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maybe we are going to see it when we actually merge it, right?
LGTM, and can't wait to see this because it unlocks better understanding of the condensation behavior with tricky reasoning LLM restrictions, and in general, really. Thank you!
I think so! Can't actually trigger workflows using labels until they're on |
Thanks for the review. I'm going to do some very minor cleanup and then get this merged so I can start using it to test some other big condenser changes (#1649) |
Summary
Adds integration-style tests for the condenser system.
Unit tests don't usually capture the oddities of the condenser system -- it requires stressing real APIs and possibly long(ish)-running conversations. The integration test system is perfect for this.
condensertest type along theintegrationandbehaviortestsChecklist
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:80740c8-pythonRun
All tags pushed for this build
About Multi-Architecture Support
80740c8-python) is a multi-arch manifest supporting both amd64 and arm6480740c8-python-amd64) are also available if needed