Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Jan 2, 2026

Summary

  • Default commit0 conversation wait timeout to 4h (override via CONVERSATION_TIMEOUT).
  • Continue evaluation on ConversationRunError(TimeoutError) and mark output as timed-out without retrying.
  • Best-effort pause on timeout so tests run against the current state.

Notes

  • This keeps error=None so outputs are aggregated, but records conversation_timed_out in test_result.

Testing

  • Not run (config-only change).

@simonrosenberg simonrosenberg enabled auto-merge (squash) January 5, 2026 17:15
@juanmichelini juanmichelini self-requested a review January 5, 2026 18:54
Copy link
Collaborator

@juanmichelini juanmichelini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@simonrosenberg simonrosenberg merged commit 07e82a4 into main Jan 5, 2026
2 checks passed
@simonrosenberg
Copy link
Collaborator Author

@OpenHands this should have been done for all benchmarks, not just commit0
Make a PR to add this update to other benchmarks

@simonrosenberg simonrosenberg deleted the commit0-timeout-4h branch January 5, 2026 18:58
@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

I'm on it! simonrosenberg can track my progress at all-hands.dev

@openhands-ai
Copy link

openhands-ai bot commented Jan 5, 2026

I've created a PR to add the configurable conversation timeout to all other benchmarks, as requested in the comment on PR #235.

PR Created: #250

Changes Made:

  • ✅ Updated gaia/run_infer.py with timeout parameter
  • ✅ Updated multiswebench/run_infer.py with timeout parameter
  • ✅ Updated openagentsafety/run_infer.py with timeout parameter
  • ✅ Updated swebench/run_infer.py with timeout parameter
  • ✅ Updated swebenchmultimodal/run_infer.py with timeout parameter
  • ✅ Updated swtbench/run_infer.py with timeout parameter

Each benchmark now reads the CONVERSATION_TIMEOUT environment variable (default: 3600 seconds / 1 hour) and passes it to conversation.run(timeout=run_timeout), matching the pattern established in PR #235 for commit0.

The PR is currently in draft status and all pre-commit checks pass.

View full conversation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants