Fix Browser action deserialization by using OpenHandsModel #136

simonrosenberg · 2025-12-06T18:51:20Z

Problem

When running GAIA evaluations with Browser actions enabled (enable_browser=True), the evaluation phase completes successfully, but the aggregation phase fails with a pydantic validation error:

ValidationError: Unexpected kind BrowserGetContentAction

This error occurs when attempting to deserialize output.jsonl files that contain Browser events (actions and observations).

Root Cause

The issue stems from how pydantic handles discriminated unions with dynamically registered types:

EvalOutput contains a history: list[Event] field
Event is a discriminated union that can contain different action/observation types
Browser action types (BrowserGetContentAction, BrowserObservation, etc.) are registered dynamically when get_default_tools(enable_browser=True) is called
Pydantic caches discriminated union schemas at import time (before Browser types are registered)
When EvalOutput extends BaseModel, the schema is frozen with only the action types that existed at import
During deserialization, pydantic encounters Browser action types that aren't in its cached schema and raises a validation error

Solution

Change EvalOutput to extend OpenHandsModel instead of BaseModel.

OpenHandsModel is a custom base class in the OpenHands SDK that automatically calls model_rebuild() before validation. This regenerates the discriminated union schema to include all dynamically registered event types, ensuring Browser actions can be properly deserialized.

Changes

File: benchmarks/utils/models.py
Change: class EvalOutput(BaseModel) → class EvalOutput(OpenHandsModel)
Added import: from openhands.sdk.utils.models import OpenHandsModel
Added comprehensive docstring explaining why OpenHandsModel is necessary

Impact

✅ Minimal and safe change: Only the parent class is modified
✅ Backward compatible: OpenHandsModel extends BaseModel
✅ No API changes: All existing code continues to work
✅ Fixes: GAIA evaluations with Browser tools
✅ Future-proof: Handles any future dynamically registered tool types

Testing

Verified with GAIA evaluation runs:

Before: Evaluation succeeded but aggregation failed with deserialization errors
After: Complete end-to-end success with Browser actions in output.jsonl properly deserialized

Why This PR is Necessary

The main branch already has GAIA support (added in #129) and uses Browser tools by default. Without this fix, all GAIA evaluations on main branch will fail during aggregation when they try to load results containing Browser events.

This is a critical bug fix that should be merged to prevent evaluation failures.

@simonrosenberg can click here to continue refining the PR

This fix resolves a critical deserialization error that occurs when GAIA evaluations use Browser actions and then attempt to load the results. Problem: -------- When running GAIA evaluations with enable_browser=True, the evaluation phase completes successfully but the aggregation phase fails with: ValidationError: Unexpected kind BrowserGetContentAction This happens during deserialization of output.jsonl files that contain Browser events (actions and observations). Root Cause: ----------- 1. EvalOutput contains a 'history: list[Event]' field 2. Event is a discriminated union that can contain different action types 3. Browser action types (BrowserGetContentAction, etc.) are registered dynamically when get_default_tools(enable_browser=True) is called 4. Pydantic caches discriminated union schemas at import time 5. When EvalOutput extends BaseModel, the schema is frozen before Browser types are registered 6. During deserialization, pydantic doesn't recognize Browser action types because they weren't in the cached schema Solution: --------- Change EvalOutput to extend OpenHandsModel instead of BaseModel. OpenHandsModel is a custom base class that automatically calls model_rebuild() before validation, which regenerates the discriminated union schema to include all dynamically registered types. Impact: ------- - Minimal change: Only the parent class is modified - Backward compatible: OpenHandsModel extends BaseModel - No API changes: All existing code continues to work - Fixes: GAIA evaluations and any future benchmarks using dynamic tools Testing: -------- Verified with GAIA evaluation runs that previously failed with deserialization errors now complete successfully end-to-end.

openhands-ai · 2025-12-06T18:52:35Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Pre-commit checks
- .github/workflows/build-gaia-image.yml

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #136 at branch `fix/browser-action-deserialization`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

neubig

Happy to take a look at this once the failing CI and merge conflicts are fixed, if it's ready.

neubig marked this pull request as draft December 31, 2025 02:07

neubig reviewed Dec 31, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Browser action deserialization by using OpenHandsModel #136

Fix Browser action deserialization by using OpenHandsModel #136

Uh oh!

simonrosenberg commented Dec 6, 2025

Uh oh!

openhands-ai bot commented Dec 6, 2025

Uh oh!

neubig left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix Browser action deserialization by using OpenHandsModel #136

Are you sure you want to change the base?

Fix Browser action deserialization by using OpenHandsModel #136

Uh oh!

Conversation

simonrosenberg commented Dec 6, 2025

Problem

Root Cause

Solution

Changes

Impact

Testing

Why This PR is Necessary

Uh oh!

openhands-ai bot commented Dec 6, 2025

Uh oh!

neubig left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants