-
Notifications
You must be signed in to change notification settings - Fork 23
Pull requests: OpenHands/benchmarks
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Make multi-swe-bench an optional dependency to fix corrupted wheel
#290
opened Jan 9, 2026 by
greynewell
Loading…
Fix cost report generation for JSONL files with null entries
#288
opened Jan 9, 2026 by
juanmichelini
Loading…
build(deps): bump the version-all group across 1 directory with 15 updates
dependencies
Pull requests that update a dependency file
python:uv
Pull requests that update python:uv code
#278
opened Jan 7, 2026 by
dependabot
bot
Loading…
refactor(gaia): use evaluation framework for max_retries parameter
#268
opened Jan 6, 2026 by
juanmichelini
Loading…
Add configurable conversation timeout to all benchmarks
#250
opened Jan 5, 2026 by
simonrosenberg
•
Draft
[DRAFT] latest main
build-swebench
Build 500 SWE-Bench Verified Image based on SDK version on this PR.
SWT-bench: cache preload, latency instrumentation, pydantic-core bump
#245
opened Jan 5, 2026 by
simonrosenberg
•
Draft
Add add_resolve_rate_to_predictions function to output_utils
#199
opened Dec 23, 2025 by
juanmichelini
•
Draft
Fix Browser action deserialization by using OpenHandsModel
#136
opened Dec 6, 2025 by
simonrosenberg
•
Draft
API-based Critic implementation
build-swebench-200
Build 200 SWE-Bench Verified Image based on SDK version on this PR.
ProTip!
Updated in the last three days: updated:>2026-01-06.