feat(condenser): Multi-summary views #1721

csmith49 · 2026-01-13T23:15:38Z

Summary

While working on #1596 we came across a problem: View objects are limited to a single summary event. This limits our condensation abilities and adds some brittle corner-case logic in the condensers.

The fix is to make it so views can have multiple summaries simultaneously.

This immediately introduces some problems:

A lot of specialized functions assume there's at most one summary.
With multiple summaries, we might not keep/forget all at the same time.
Summaries are "synthetic" events introduced between the conversation and the agent, and so can't be referenced by Condensation events because they don't have consistent IDs.

So this PR deletes the specialized functions and adds a standard pattern for summary IDs. The logic for constructing the view is also simplified to process condensations one at a time instead of all at once, and some functions that were enforcing constraints on the view have their signatures reworked.

Additional changes include:

Updated tests to remove references to unused functions
A new test suite for multi-summary views
Updates to the LLMSummarizingCondenser to handle the new view structure.

Checklist

If the PR is changing/adding functionality, are there tests to reflect this?
If there is an example, have you run the example to make sure that it works?
If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
Is the github CI passing?

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Architectures	Base Image	Docs / Tags
java	amd64, arm64	`eclipse-temurin:17-jdk`	Link
python	amd64, arm64	`nikolaik/python-nodejs:python3.12-nodejs22`	Link
golang	amd64, arm64	`golang:1.21-bookworm`	Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4d6b5bf-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4d6b5bf-python \
  ghcr.io/openhands/agent-server:4d6b5bf-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4d6b5bf-golang-amd64
ghcr.io/openhands/agent-server:4d6b5bf-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4d6b5bf-golang-arm64
ghcr.io/openhands/agent-server:4d6b5bf-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4d6b5bf-java-amd64
ghcr.io/openhands/agent-server:4d6b5bf-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4d6b5bf-java-arm64
ghcr.io/openhands/agent-server:4d6b5bf-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4d6b5bf-python-amd64
ghcr.io/openhands/agent-server:4d6b5bf-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4d6b5bf-python-arm64
ghcr.io/openhands/agent-server:4d6b5bf-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4d6b5bf-golang
ghcr.io/openhands/agent-server:4d6b5bf-java
ghcr.io/openhands/agent-server:4d6b5bf-python

About Multi-Architecture Support

Each variant tag (e.g., 4d6b5bf-python) is a multi-arch manifest supporting both amd64 and arm64
Docker automatically pulls the correct architecture for your platform
Individual architecture tags (e.g., 4d6b5bf-python-amd64) are also available if needed

lease enter the commit message for your changes. Lines starting

github-actions · 2026-01-13T23:19:02Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
openhands-sdk/openhands/sdk/context
view.py	187	3	98%	227, 263, 424
openhands-sdk/openhands/sdk/event
condenser.py	52	1	98%	63
TOTAL	15642	4546	70%

github-actions · 2026-01-14T16:36:45Z

Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly.

Note: These are non-blocking tests that validate condenser functionality across different LLMs.

github-actions · 2026-01-14T16:41:11Z

Condenser Test Results (Non-Blocking)

These tests validate condenser functionality and do not block PR merges.

🧪 Condenser Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.62
Models Tested: 2
Timestamp: 2026-01-14 16:41:03 UTC

📊 Summary

Model	Overall	Tests Passed	Skipped	Total	Cost	Tokens
litellm_proxy_anthropic_claude_opus_4_5_20251101	100.0%	5/5	0	5	$0.60	273,830
litellm_proxy_gpt_5.1_codex_max	100.0%	2/2	3	5	$0.02	21,682

📋 Detailed Results

litellm_proxy_anthropic_claude_opus_4_5_20251101

Success Rate: 100.0% (5/5)
Total Cost: $0.60
Token Usage: prompt: 262,329, completion: 11,501, cache_read: 228,271, cache_write: 26,221, reasoning: 1,403
Run Suffix: litellm_proxy_anthropic_claude_opus_4_5_20251101_14adbf3_opus_condenser_run_N5_20260114_163711

litellm_proxy_gpt_5.1_codex_max

Success Rate: 100.0% (2/2)
Total Cost: $0.02
Token Usage: prompt: 21,066, completion: 616, cache_read: 15,104, reasoning: 256
Run Suffix: litellm_proxy_gpt_5.1_codex_max_14adbf3_gpt51_condenser_run_N5_20260114_163710
Skipped Tests: 3

Skipped Tests:

c04_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.
c01_thinking_block_condenser: Model litellm_proxy/gpt-5.1-codex-max does not support extended thinking or reasoning effort
c05_size_condenser: This test stresses long repetitive tool loops to trigger size-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

csmith49 · 2026-01-14T16:56:58Z

openhands-sdk/openhands/sdk/context/condenser/llm_summarizing_condenser.py

        if Reason.REQUEST in reasons:
            return CondensationRequirement.HARD

-    def _get_summary_event_content(self, view: View) -> str:


No longer needed -- the previous summaries are now captured as regular events in the summary generation.

openhands-sdk/openhands/sdk/context/view.py

all-hands-bot

This PR introduces important functionality for multi-summary views. I found a critical issue with batch atomicity enforcement and some suggestions for improving robustness. Details in inline comments.

openhands-sdk/openhands/sdk/context/view.py

openhands-ai · 2026-01-15T17:05:41Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Run tests
- Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1721 at branch `feat/multi-summary-views`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

extra loop

csmith49 · 2026-01-15T18:04:35Z

@enyst While working on this PR I remembered why the manipulation index calculations were strict wrt the keep_first parameter: because we'd keep keep_first events at the start, and then the summary, and then all the events we cared about.

Never really mattered with the way we filtered summary events, but definitely matters after this PR.

enyst · 2026-01-16T00:41:59Z

Ah I see.

@OpenHands Do a /codereview-roasted on this PR. Investigate deeply! Respond as a comment on github on this PR, so consider that your output will be rendered as markdown.

openhands-ai · 2026-01-16T00:42:09Z

I'm on it! enyst can track my progress at all-hands.dev

openhands-ai · 2026-01-16T00:45:02Z

You asked for a /codereview-roasted comment (Markdown) on PR #1721.

What I did

Located the repo at /workspace/project/software-agent-sdk and verified we’re on branch feat/multi-summary-views.
Reviewed the PR diff vs origin/main, focusing on:
- openhands-sdk/openhands/sdk/context/view.py (multi-summary view construction, condensation application, deterministic summary IDs, batch atomicity/tool-call filtering changes)
- LLMSummarizingCondenser and its prompt changes.
Ran the relevant context/unit tests that this PR touches:
- pytest tests/sdk/context/test_view_multi_summary.py → 24 passed
- pytest tests/sdk/context/test_view.py tests/sdk/context/test_view_action_filtering.py tests/sdk/context/test_view_non_exec_filtering.py tests/sdk/context/condenser/test_llm_summarizing_condenser.py → 55 passed
- Note: full repo pytest collection fails locally due to missing optional workspace components (fastapi/agent_server/tools packages not installed in this sandbox); I did not attempt to fix CI.

Deliverable produced

I provided the requested roasted code review as a GitHub-ready Markdown comment, including:
- Taste rating
- Critical issues (deterministic summary IDs coupled to persistence regex, underspecified summary_offset semantics, batch atomicity API assumptions)
- Improvement opportunities (pipeline clarity, redundant scans, prompt semantics regression risk)
- Minor style notes
- Merge verdict + key insight

Checklist vs your request

Deep investigation of PR diff (core file + condenser + prompt + tests)
Output formatted as a GitHub PR comment in Markdown
No code modifications (as requested by /codereview)

Repo changes / pushing

No changes were made to the branch (no commits), so there is nothing to push.

If you want, I can also turn the “must fix / must justify” items into a smaller, actionable checklist for the PR author.

View full conversation

enyst · 2026-01-16T02:10:19Z

🟡 Taste rating: Acceptable
You’re solving a real problem (single-summary Views were a straightjacket), but the implementation trades “brittle corner-cases” for “now everything is a corner-case unless you’re very disciplined”. The core idea is fine; the execution has some sharp edges that will absolutely cut the next person who touches this.

[CRITICAL ISSUES] (must fix / at least must justify loudly)

1) Deterministic summary IDs are a hack glued to an unrelated constraint

[openhands-sdk/openhands/sdk/context/view.py, ~451]

summary_id = f"{condensation.id.replace('_', '-')}-summary"

You’re encoding a file persistence regex constraint into the semantic identity of an event. That’s the kind of “works today, explodes later” coupling that makes systems rot.

Why this is bad: IDs are now shaped by persistence quirks, not domain meaning. If another component later expects event IDs to be stable across transports, or expects underscore preservation, you’ve just created a silent incompatibility.
Concrete improvement: introduce a dedicated summary_event_id_for(condensation_id: EventID) -> EventID helper (or better: a structured ID like CondensationSummaryEventID(condensation_id=...)) and keep normalization/persistence rules in one place. If the regex requires hyphens, encode that rule centrally.

2) `summary_offset` semantics are still underspecified and will bite you with multiple condensations

[openhands-sdk/openhands/sdk/context/view.py, ~429-457] apply_condensation()

You apply condensations sequentially, which is good, but what does summary_offset mean?

offset into the original conversation?
offset into the post-filtered view events at time of condensation?
offset into the list after forgetting but before inserting?

Right now it’s “whatever list you happen to have at that moment”. That’s fine internally, but then you also want condensations to be referential and stable over time. Those goals fight each other.

Failure mode: a later condensation generated with one interpretation of offsets will be replayed under another interpretation (because the view-building algorithm changed again), and your summaries drift.
Concrete improvement: document it as an invariant in the Condensation event model and add assertions/tests that enforce it. If summary_offset is “offset in the view after removing forgotten events from the then-current view”, say that explicitly.

3) Batch atomicity now depends on having all events available, but the API doesn’t enforce that

[openhands-sdk/openhands/sdk/context/view.py, ~265+] _enforce_batch_atomicity(view_events, all_events)

You changed the signature to require all_events (good), but nothing stops a caller from passing a subset and getting garbage behavior. The function name doesn’t communicate that it’s doing a join against global batch metadata.

Concrete improvement: either:
- move batch construction higher so View.from_events() owns it (and pass a precomputed batch map), or
- rename params to something more honest like source_events / view_events, and assert that view_events is a subsequence of source_events IDs.

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

4) You deleted “specialized functions” and replaced them with “specialized inline conventions”

This PR claims to remove brittle helper logic, but the brittleness didn’t disappear — it moved:

“summary IDs are synthetic but deterministic”
“condensation requests are unhandled if they appear after the last condensation”
“summary insertion uses offsets into a mutable stream”
“some events aren’t LLMConvertible, silently skipped”

This is still specialized behavior; it’s just less discoverable now.

Concrete improvement: formalize the view-building pipeline as named steps with explicit contracts:
1. strip requests
2. stream condensations
3. apply forgetting
4. insert summaries
5. enforce atomicity
6. enforce tool-call pairing
  Then unit-test those steps independently.

5) `unhandled_condensation_request_exists()` is linear scan every time — and computed after already scanning

[openhands-sdk/openhands/sdk/context/view.py, ~460+]

You already iterate through events once in from_events(), then you reverse-scan again. Not a performance crisis, but it’s sloppy and reads like “I couldn’t be bothered to track a boolean”.

Concrete improvement: compute it during the first pass (or track “seen_condensation” and “seen_request_after_last_condensation”).

6) `_filter_unmatched_tool_calls()` docstring lies (and the signature is suspicious)

[openhands-sdk/openhands/sdk/context/view.py, ~319+]
It says all_events is “unused for now” but then you do use it for batch atomicity via ActionBatch.from_events(all_events).

That mismatch makes it harder to reason about correctness.

Concrete improvement: fix the docstring and rename batch vs action_batch consistently (you currently use both concepts across edits, which reads like half-refactor).

7) Condenser prompt change removes an explicit “previous summary” channel — hope you like regression roulette

[openhands-sdk/openhands/sdk/context/condenser/prompts/summarizing_prompt.j2]
You dropped <PREVIOUS SUMMARY> and now rely on summaries being present in the events list. That might be correct structurally, but it changes the prompt semantics:

models previously got “previous summary” as a privileged signal
now it’s just another event in the stream, competing with noise

Your non-blocking condenser tests passed (good), but this is exactly the kind of change that passes a tiny suite and then degrades quality in real conversations.

Concrete improvement: keep the explicit previous summary section but derive it from “most recent summary event” (or “all summaries in view”, but clearly separated). Multi-summary doesn’t mean “throw away the one thing that guided the model”.

[STYLE NOTES] (minor, but still worth fixing)

8) Variable name `_new_summary_event` screams “I fought the linter”

[openhands-sdk/openhands/sdk/context/view.py, ~452-456]
Leading underscore in a local is weird and communicates nothing.

9) Silent skipping of non-LLMConvertible events is risky

[openhands-sdk/openhands/sdk/context/view.py, ~495+]
Logging at debug means most users won’t see “we dropped events on the floor”. If that’s intended, fine — but then codify it as policy, not as a debug log.

VERDICT

✅ Worth merging if you tighten the semantics around summary_offset and stop baking persistence quirks into event identity.
Right now it works, but it’s the kind of “works” that future you will pay interest on.

Key insight: you fixed “Views can only represent one summary” by turning summaries into stateful stream mutations — that’s fine, but only if the mutation semantics (IDs, offsets, replay guarantees) are defined like an API, not implied by whatever from_events() happens to do this month.

…mary offset

csmith49 · 2026-01-16T17:17:57Z

[CRITICAL ISSUES] (must fix / at least must justify loudly)

Deterministic summary IDs are a hack glued to an unrelated constraint
...

summary_offset semantics are still underspecified and will bite you with multiple condensations
...

The summary offset semantics haven't changed, but documentation could be more verbose. Addressed these two points by moving the condensation semantics to the Condensation event instead of the View.

Batch atomicity now depends on having all events available, but the API doesn’t enforce that
...

Semantics of batch atomicity hasn't changed, just the way the code is structured. The API is a bit messy but that's what this PR is for.

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

You deleted “specialized functions” and replaced them with “specialized inline conventions”
...

unhandled_condensation_request_exists() is linear scan every time — and computed after already scanning
[openhands-sdk/openhands/sdk/context/view.py, ~460+]
...

The suggested fixes here are incompatible. The pipeline requires multiple linear passes if you actually want to separate the steps in code. We'd like a cleaner pipeline (and this PR improves the structure dramatically) but larger changes are beyond the scope of this PR.

_filter_unmatched_tool_calls() docstring lies (and the signature is suspicious)
[openhands-sdk/openhands/sdk/context/view.py, ~319+] It says all_events is “unused for now” but then you do use it for batch atomicity via ActionBatch.from_events(all_events).
...

Addressed.

Condenser prompt change removes an explicit “previous summary” channel — hope you like regression roulette
...

Regressions are a concern, for sure, but the changes suggested here aren't a fix because they ignore the positional nature of summaries. Anecdotally I haven't noticed any performance degradation.

[STYLE NOTES] (minor, but still worth fixing)

Variable name _new_summary_event screams “I fought the linter”
...

Says the model that wrote that line of code! (Also removed when addressing other issues).

Silent skipping of non-LLMConvertible events is risky
...

This is standard and expected behavior (which is why the log level is DEBUG).

Calvin Smith added 11 commits January 13, 2026 10:30

semantics of a single condensation captured in apply_condensation

1992a94

view condensation application via scanning

1d29bf1

refactor enforce_batch_atomicity signature

ccdc5c7

minor enforcement function refactor

6699327

wq

1841b0c

lease enter the commit message for your changes. Lines starting

deterministic summarization "ids"

dae64ed

tests passing, minor changes

9d2a2a2

update llm condenser to treat summary events more generically

880fd73

linting

7cc678e

removing unused view functions

d0370db

removing refs to unused functions

ddb6f57

Calvin Smith and others added 3 commits January 13, 2026 16:22

minor type fixes

d90cb4f

Merge branch 'main' into feat/multi-summary-views

a04705d

Merge branch 'main' into feat/multi-summary-views

14adbf3

csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 14, 2026

csmith49 commented Jan 14, 2026

View reviewed changes

openhands-sdk/openhands/sdk/context/view.py Outdated Show resolved Hide resolved

Merge branch 'main' into feat/multi-summary-views

5401643

csmith49 marked this pull request as ready for review January 14, 2026 17:11

all-hands-bot reviewed Jan 14, 2026

View reviewed changes

Calvin Smith added 2 commits January 14, 2026 10:36

minor view tweak

3485b1e

Id generation uses event ids

33cc903

VascoSch92 reviewed Jan 15, 2026

View reviewed changes

openhands-sdk/openhands/sdk/context/view.py Outdated Show resolved Hide resolved

openhands-sdk/openhands/sdk/context/view.py Outdated Show resolved Hide resolved

openhands-sdk/openhands/sdk/context/view.py Show resolved Hide resolved

set + list comprehension changes, minor

2407053

Calvin Smith and others added 2 commits January 15, 2026 10:38

removing

d609b18

extra loop

Merge branch 'main' into feat/multi-summary-views

9cb75a8

Merge branch 'main' into feat/multi-summary-views

9ecb648

csmith49 and others added 6 commits January 16, 2026 08:03

Merge branch 'main' into feat/multi-summary-views

f8792e5

Merge branch 'main' into feat/multi-summary-views

4cd9dc3

moving summary event construction to condensation event, docs for sum…

67ac567

…mary offset

condensation semantics to condensation event, documentation tweaks

20ddb71

linting

beff30f

Merge branch 'main' into feat/multi-summary-views

e7ca22f

feat(condenser): Multi-summary views #1721

Are you sure you want to change the base?

feat(condenser): Multi-summary views #1721

Uh oh!

Conversation

csmith49 commented Jan 13, 2026 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Checklist

Uh oh!

github-actions bot commented Jan 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Jan 14, 2026

Uh oh!

github-actions bot commented Jan 14, 2026

Condenser Test Results (Non-Blocking)

🧪 Condenser Tests Results

📊 Summary

📋 Detailed Results

litellm_proxy_anthropic_claude_opus_4_5_20251101

litellm_proxy_gpt_5.1_codex_max

Uh oh!

csmith49 Jan 14, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

all-hands-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

openhands-ai bot commented Jan 15, 2026

Uh oh!

csmith49 commented Jan 15, 2026

Uh oh!

enyst commented Jan 16, 2026

Uh oh!

openhands-ai bot commented Jan 16, 2026

Uh oh!

openhands-ai bot commented Jan 16, 2026

Uh oh!

enyst commented Jan 16, 2026

[CRITICAL ISSUES] (must fix / at least must justify loudly)

1) Deterministic summary IDs are a hack glued to an unrelated constraint

2) summary_offset semantics are still underspecified and will bite you with multiple condensations

3) Batch atomicity now depends on having all events available, but the API doesn’t enforce that

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

4) You deleted “specialized functions” and replaced them with “specialized inline conventions”

5) unhandled_condensation_request_exists() is linear scan every time — and computed after already scanning

6) _filter_unmatched_tool_calls() docstring lies (and the signature is suspicious)

7) Condenser prompt change removes an explicit “previous summary” channel — hope you like regression roulette

[STYLE NOTES] (minor, but still worth fixing)

8) Variable name _new_summary_event screams “I fought the linter”

9) Silent skipping of non-LLMConvertible events is risky

VERDICT

Uh oh!

csmith49 commented Jan 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

[CRITICAL ISSUES] (must fix / at least must justify loudly)

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

[STYLE NOTES] (minor, but still worth fixing)

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

csmith49 commented Jan 13, 2026 •

edited by github-actions bot

Loading

github-actions bot commented Jan 13, 2026 •

edited

Loading

2) `summary_offset` semantics are still underspecified and will bite you with multiple condensations

5) `unhandled_condensation_request_exists()` is linear scan every time — and computed after already scanning

6) `_filter_unmatched_tool_calls()` docstring lies (and the signature is suspicious)

8) Variable name `_new_summary_event` screams “I fought the linter”

csmith49 commented Jan 16, 2026 •

edited

Loading