Add configurable conversation timeout to all benchmarks #250

simonrosenberg · 2026-01-05T19:02:07Z

Summary

Extends the timeout configuration from PR #235 (commit0) to all other benchmarks:

gaia
multiswebench
openagentsafety
swebench
swebenchmultimodal
swtbench

Changes

Each benchmark's run_infer.py now reads the CONVERSATION_TIMEOUT environment variable (default: 3600 seconds / 1 hour) and passes it to conversation.run(timeout=run_timeout).

Notes

This is a follow-up to commit0: extend run timeout and continue on timeout #235 as requested in the PR comments
Default timeout is 1 hour (3600 seconds), configurable via CONVERSATION_TIMEOUT env var

Testing

Pre-commit checks pass (ruff format, ruff lint, pycodestyle, pyright)

@simonrosenberg can click here to continue refining the PR

Apply the same timeout configuration from commit0 to all other benchmarks: - gaia - multiswebench - openagentsafety - swebench - swebenchmultimodal - swtbench Default timeout is 3600 seconds (1 hour), configurable via CONVERSATION_TIMEOUT env var. Co-authored-by: openhands <[email protected]>

openhands-ai bot mentioned this pull request Jan 5, 2026

commit0: extend run timeout and continue on timeout #235

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add configurable conversation timeout to all benchmarks #250

Add configurable conversation timeout to all benchmarks #250

simonrosenberg commented Jan 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add configurable conversation timeout to all benchmarks #250

Are you sure you want to change the base?

Add configurable conversation timeout to all benchmarks #250

Conversation

simonrosenberg commented Jan 5, 2026

Summary

Changes

Notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants