Skip to content

Conversation

@csmith49
Copy link
Collaborator

@csmith49 csmith49 commented Jan 13, 2026

Summary

While working on #1596 we came across a problem: View objects are limited to a single summary event. This limits our condensation abilities and adds some brittle corner-case logic in the condensers.

The fix is to make it so views can have multiple summaries simultaneously.

This immediately introduces some problems:

  • A lot of specialized functions assume there's at most one summary.
  • With multiple summaries, we might not keep/forget all at the same time.
  • Summaries are "synthetic" events introduced between the conversation and the agent, and so can't be referenced by Condensation events because they don't have consistent IDs.

So this PR deletes the specialized functions and adds a standard pattern for summary IDs. The logic for constructing the view is also simplified to process condensations one at a time instead of all at once, and some functions that were enforcing constraints on the view have their signatures reworked.

Additional changes include:

  • Updated tests to remove references to unused functions
  • A new test suite for multi-summary views
  • Updates to the LLMSummarizingCondenser to handle the new view structure.

Checklist

  • If the PR is changing/adding functionality, are there tests to reflect this?
  • If there is an example, have you run the example to make sure that it works?
  • If there are instructions on how to run the code, have you followed the instructions and made sure that it works?
  • If the feature is significant enough to require documentation, is there a PR open on the OpenHands/docs repository with the same branch name?
  • Is the github CI passing?

Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:4d6b5bf-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-4d6b5bf-python \
  ghcr.io/openhands/agent-server:4d6b5bf-python

All tags pushed for this build

ghcr.io/openhands/agent-server:4d6b5bf-golang-amd64
ghcr.io/openhands/agent-server:4d6b5bf-golang_tag_1.21-bookworm-amd64
ghcr.io/openhands/agent-server:4d6b5bf-golang-arm64
ghcr.io/openhands/agent-server:4d6b5bf-golang_tag_1.21-bookworm-arm64
ghcr.io/openhands/agent-server:4d6b5bf-java-amd64
ghcr.io/openhands/agent-server:4d6b5bf-eclipse-temurin_tag_17-jdk-amd64
ghcr.io/openhands/agent-server:4d6b5bf-java-arm64
ghcr.io/openhands/agent-server:4d6b5bf-eclipse-temurin_tag_17-jdk-arm64
ghcr.io/openhands/agent-server:4d6b5bf-python-amd64
ghcr.io/openhands/agent-server:4d6b5bf-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-amd64
ghcr.io/openhands/agent-server:4d6b5bf-python-arm64
ghcr.io/openhands/agent-server:4d6b5bf-nikolaik_s_python-nodejs_tag_python3.12-nodejs22-arm64
ghcr.io/openhands/agent-server:4d6b5bf-golang
ghcr.io/openhands/agent-server:4d6b5bf-java
ghcr.io/openhands/agent-server:4d6b5bf-python

About Multi-Architecture Support

  • Each variant tag (e.g., 4d6b5bf-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 4d6b5bf-python-amd64) are also available if needed

@github-actions
Copy link
Contributor

github-actions bot commented Jan 13, 2026

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/context
   view.py187398%227, 263, 424
openhands-sdk/openhands/sdk/event
   condenser.py52198%63
TOTAL15642454670% 

@csmith49 csmith49 added the condenser-test Triggers a run of all condenser integration tests label Jan 14, 2026
@github-actions
Copy link
Contributor

Hi! I started running the condenser tests on your PR. You will receive a comment with the results shortly.

Note: These are non-blocking tests that validate condenser functionality across different LLMs.

@github-actions
Copy link
Contributor

Condenser Test Results (Non-Blocking)

These tests validate condenser functionality and do not block PR merges.

🧪 Condenser Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.62
Models Tested: 2
Timestamp: 2026-01-14 16:41:03 UTC

📊 Summary

Model Overall Tests Passed Skipped Total Cost Tokens
litellm_proxy_anthropic_claude_opus_4_5_20251101 100.0% 5/5 0 5 $0.60 273,830
litellm_proxy_gpt_5.1_codex_max 100.0% 2/2 3 5 $0.02 21,682

📋 Detailed Results

litellm_proxy_anthropic_claude_opus_4_5_20251101

  • Success Rate: 100.0% (5/5)
  • Total Cost: $0.60
  • Token Usage: prompt: 262,329, completion: 11,501, cache_read: 228,271, cache_write: 26,221, reasoning: 1,403
  • Run Suffix: litellm_proxy_anthropic_claude_opus_4_5_20251101_14adbf3_opus_condenser_run_N5_20260114_163711

litellm_proxy_gpt_5.1_codex_max

  • Success Rate: 100.0% (2/2)
  • Total Cost: $0.02
  • Token Usage: prompt: 21,066, completion: 616, cache_read: 15,104, reasoning: 256
  • Run Suffix: litellm_proxy_gpt_5.1_codex_max_14adbf3_gpt51_condenser_run_N5_20260114_163710
  • Skipped Tests: 3

Skipped Tests:

  • c04_token_condenser: This test stresses long repetitive tool loops to trigger token-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.
  • c01_thinking_block_condenser: Model litellm_proxy/gpt-5.1-codex-max does not support extended thinking or reasoning effort
  • c05_size_condenser: This test stresses long repetitive tool loops to trigger size-based condensation. GPT-5.1 Codex Max often declines such requests for efficiency/safety reasons.

if Reason.REQUEST in reasons:
return CondensationRequirement.HARD

def _get_summary_event_content(self, view: View) -> str:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed -- the previous summaries are now captured as regular events in the summary generation.

@csmith49 csmith49 marked this pull request as ready for review January 14, 2026 17:11
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces important functionality for multi-summary views. I found a critical issue with batch atomicity enforcement and some suggestions for improving robustness. Details in inline comments.

@openhands-ai
Copy link

openhands-ai bot commented Jan 15, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests
    • Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1721 at branch `feat/multi-summary-views`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@csmith49
Copy link
Collaborator Author

@enyst While working on this PR I remembered why the manipulation index calculations were strict wrt the keep_first parameter: because we'd keep keep_first events at the start, and then the summary, and then all the events we cared about.

Never really mattered with the way we filtered summary events, but definitely matters after this PR.

@enyst
Copy link
Collaborator

enyst commented Jan 16, 2026

Ah I see.

@OpenHands Do a /codereview-roasted on this PR. Investigate deeply! Respond as a comment on github on this PR, so consider that your output will be rendered as markdown.

@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

I'm on it! enyst can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 16, 2026

You asked for a /codereview-roasted comment (Markdown) on PR #1721.

What I did

  • Located the repo at /workspace/project/software-agent-sdk and verified we’re on branch feat/multi-summary-views.
  • Reviewed the PR diff vs origin/main, focusing on:
    • openhands-sdk/openhands/sdk/context/view.py (multi-summary view construction, condensation application, deterministic summary IDs, batch atomicity/tool-call filtering changes)
    • LLMSummarizingCondenser and its prompt changes.
  • Ran the relevant context/unit tests that this PR touches:
    • pytest tests/sdk/context/test_view_multi_summary.py24 passed
    • pytest tests/sdk/context/test_view.py tests/sdk/context/test_view_action_filtering.py tests/sdk/context/test_view_non_exec_filtering.py tests/sdk/context/condenser/test_llm_summarizing_condenser.py55 passed
    • Note: full repo pytest collection fails locally due to missing optional workspace components (fastapi/agent_server/tools packages not installed in this sandbox); I did not attempt to fix CI.

Deliverable produced

  • I provided the requested roasted code review as a GitHub-ready Markdown comment, including:
    • Taste rating
    • Critical issues (deterministic summary IDs coupled to persistence regex, underspecified summary_offset semantics, batch atomicity API assumptions)
    • Improvement opportunities (pipeline clarity, redundant scans, prompt semantics regression risk)
    • Minor style notes
    • Merge verdict + key insight

Checklist vs your request

  • Deep investigation of PR diff (core file + condenser + prompt + tests)
  • Output formatted as a GitHub PR comment in Markdown
  • No code modifications (as requested by /codereview)

Repo changes / pushing

  • No changes were made to the branch (no commits), so there is nothing to push.

If you want, I can also turn the “must fix / must justify” items into a smaller, actionable checklist for the PR author.

View full conversation

Copy link
Collaborator

enyst commented Jan 16, 2026

🟡 Taste rating: Acceptable
You’re solving a real problem (single-summary Views were a straightjacket), but the implementation trades “brittle corner-cases” for “now everything is a corner-case unless you’re very disciplined”. The core idea is fine; the execution has some sharp edges that will absolutely cut the next person who touches this.


[CRITICAL ISSUES] (must fix / at least must justify loudly)

1) Deterministic summary IDs are a hack glued to an unrelated constraint

[openhands-sdk/openhands/sdk/context/view.py, ~451]

summary_id = f"{condensation.id.replace('_', '-')}-summary"

You’re encoding a file persistence regex constraint into the semantic identity of an event. That’s the kind of “works today, explodes later” coupling that makes systems rot.

  • Why this is bad: IDs are now shaped by persistence quirks, not domain meaning. If another component later expects event IDs to be stable across transports, or expects underscore preservation, you’ve just created a silent incompatibility.
  • Concrete improvement: introduce a dedicated summary_event_id_for(condensation_id: EventID) -> EventID helper (or better: a structured ID like CondensationSummaryEventID(condensation_id=...)) and keep normalization/persistence rules in one place. If the regex requires hyphens, encode that rule centrally.

2) summary_offset semantics are still underspecified and will bite you with multiple condensations

[openhands-sdk/openhands/sdk/context/view.py, ~429-457] apply_condensation()

You apply condensations sequentially, which is good, but what does summary_offset mean?

  • offset into the original conversation?
  • offset into the post-filtered view events at time of condensation?
  • offset into the list after forgetting but before inserting?

Right now it’s “whatever list you happen to have at that moment”. That’s fine internally, but then you also want condensations to be referential and stable over time. Those goals fight each other.

  • Failure mode: a later condensation generated with one interpretation of offsets will be replayed under another interpretation (because the view-building algorithm changed again), and your summaries drift.
  • Concrete improvement: document it as an invariant in the Condensation event model and add assertions/tests that enforce it. If summary_offset is “offset in the view after removing forgotten events from the then-current view”, say that explicitly.

3) Batch atomicity now depends on having all events available, but the API doesn’t enforce that

[openhands-sdk/openhands/sdk/context/view.py, ~265+] _enforce_batch_atomicity(view_events, all_events)

You changed the signature to require all_events (good), but nothing stops a caller from passing a subset and getting garbage behavior. The function name doesn’t communicate that it’s doing a join against global batch metadata.

  • Concrete improvement: either:
    • move batch construction higher so View.from_events() owns it (and pass a precomputed batch map), or
    • rename params to something more honest like source_events / view_events, and assert that view_events is a subsequence of source_events IDs.

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

4) You deleted “specialized functions” and replaced them with “specialized inline conventions”

This PR claims to remove brittle helper logic, but the brittleness didn’t disappear — it moved:

  • “summary IDs are synthetic but deterministic”
  • “condensation requests are unhandled if they appear after the last condensation”
  • “summary insertion uses offsets into a mutable stream”
  • “some events aren’t LLMConvertible, silently skipped”

This is still specialized behavior; it’s just less discoverable now.

  • Concrete improvement: formalize the view-building pipeline as named steps with explicit contracts:
    1. strip requests
    2. stream condensations
    3. apply forgetting
    4. insert summaries
    5. enforce atomicity
    6. enforce tool-call pairing
      Then unit-test those steps independently.

5) unhandled_condensation_request_exists() is linear scan every time — and computed after already scanning

[openhands-sdk/openhands/sdk/context/view.py, ~460+]

You already iterate through events once in from_events(), then you reverse-scan again. Not a performance crisis, but it’s sloppy and reads like “I couldn’t be bothered to track a boolean”.

  • Concrete improvement: compute it during the first pass (or track “seen_condensation” and “seen_request_after_last_condensation”).

6) _filter_unmatched_tool_calls() docstring lies (and the signature is suspicious)

[openhands-sdk/openhands/sdk/context/view.py, ~319+]
It says all_events is “unused for now” but then you do use it for batch atomicity via ActionBatch.from_events(all_events).

That mismatch makes it harder to reason about correctness.

  • Concrete improvement: fix the docstring and rename batch vs action_batch consistently (you currently use both concepts across edits, which reads like half-refactor).

7) Condenser prompt change removes an explicit “previous summary” channel — hope you like regression roulette

[openhands-sdk/openhands/sdk/context/condenser/prompts/summarizing_prompt.j2]
You dropped <PREVIOUS SUMMARY> and now rely on summaries being present in the events list. That might be correct structurally, but it changes the prompt semantics:

  • models previously got “previous summary” as a privileged signal
  • now it’s just another event in the stream, competing with noise

Your non-blocking condenser tests passed (good), but this is exactly the kind of change that passes a tiny suite and then degrades quality in real conversations.

  • Concrete improvement: keep the explicit previous summary section but derive it from “most recent summary event” (or “all summaries in view”, but clearly separated). Multi-summary doesn’t mean “throw away the one thing that guided the model”.

[STYLE NOTES] (minor, but still worth fixing)

8) Variable name _new_summary_event screams “I fought the linter”

[openhands-sdk/openhands/sdk/context/view.py, ~452-456]
Leading underscore in a local is weird and communicates nothing.

9) Silent skipping of non-LLMConvertible events is risky

[openhands-sdk/openhands/sdk/context/view.py, ~495+]
Logging at debug means most users won’t see “we dropped events on the floor”. If that’s intended, fine — but then codify it as policy, not as a debug log.


VERDICT

Worth merging if you tighten the semantics around summary_offset and stop baking persistence quirks into event identity.
Right now it works, but it’s the kind of “works” that future you will pay interest on.

Key insight: you fixed “Views can only represent one summary” by turning summaries into stateful stream mutations — that’s fine, but only if the mutation semantics (IDs, offsets, replay guarantees) are defined like an API, not implied by whatever from_events() happens to do this month.

@csmith49
Copy link
Collaborator Author

csmith49 commented Jan 16, 2026

[CRITICAL ISSUES] (must fix / at least must justify loudly)

  1. Deterministic summary IDs are a hack glued to an unrelated constraint
    ...
  2. summary_offset semantics are still underspecified and will bite you with multiple condensations
    ...

The summary offset semantics haven't changed, but documentation could be more verbose. Addressed these two points by moving the condensation semantics to the Condensation event instead of the View.

  1. Batch atomicity now depends on having all events available, but the API doesn’t enforce that
    ...

Semantics of batch atomicity hasn't changed, just the way the code is structured. The API is a bit messy but that's what this PR is for.

[IMPROVEMENT OPPORTUNITIES] (good taste / maintainability)

  1. You deleted “specialized functions” and replaced them with “specialized inline conventions”
    ...
  2. unhandled_condensation_request_exists() is linear scan every time — and computed after already scanning
    [openhands-sdk/openhands/sdk/context/view.py, ~460+]
    ...

The suggested fixes here are incompatible. The pipeline requires multiple linear passes if you actually want to separate the steps in code. We'd like a cleaner pipeline (and this PR improves the structure dramatically) but larger changes are beyond the scope of this PR.

  1. _filter_unmatched_tool_calls() docstring lies (and the signature is suspicious)
    [openhands-sdk/openhands/sdk/context/view.py, ~319+] It says all_events is “unused for now” but then you do use it for batch atomicity via ActionBatch.from_events(all_events).
    ...

Addressed.

  1. Condenser prompt change removes an explicit “previous summary” channel — hope you like regression roulette
    ...

Regressions are a concern, for sure, but the changes suggested here aren't a fix because they ignore the positional nature of summaries. Anecdotally I haven't noticed any performance degradation.

[STYLE NOTES] (minor, but still worth fixing)

  1. Variable name _new_summary_event screams “I fought the linter”
    ...

Says the model that wrote that line of code! (Also removed when addressing other issues).

  1. Silent skipping of non-LLMConvertible events is risky
    ...

This is standard and expected behavior (which is why the log level is DEBUG).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

condenser-test Triggers a run of all condenser integration tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants