Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Jan 3, 2026

Summary

  • add output_jsonl_gcs input to run-eval workflow
  • forward through dispatch payload to software-agent-sdk
  • bump pydantic to >=2.12.0 to avoid swebench inference serialization error

Testing

  • gh workflow run run-eval.yml --ref issue-236-output-jsonl -f benchmark=gaia -f eval_limit=1 -f sdk_ref=issue-236-output-jsonl -f benchmarks_branch=issue-236-output-jsonl -f eval_branch=issue-236-output-jsonl -f output_jsonl_gcs=gs://openhands-evaluation-results/eval-20622411875-claude-son_litellm_proxy-claude-sonnet-4-5-20250929_25-12-31-16-36.tar.gz -f reason="rerun latest gaia 1-image"
  • gh workflow run run-eval.yml --ref issue-236-output-jsonl -f benchmark=commit0 -f eval_limit=1 -f sdk_ref=issue-236-output-jsonl -f benchmarks_branch=issue-236-output-jsonl -f eval_branch=issue-236-output-jsonl -f output_jsonl_gcs=gs://openhands-evaluation-results/eval-20622412428-claude-son_litellm_proxy-claude-sonnet-4-5-20250929_25-12-31-18-13.tar.gz -f reason="rerun latest commit0 1-image"
  • gh workflow run run-eval.yml --ref issue-236-output-jsonl -f benchmark=swebench -f eval_limit=1 -f sdk_ref=issue-236-output-jsonl -f benchmarks_branch=issue-236-output-jsonl -f eval_branch=issue-236-output-jsonl -f output_jsonl_gcs=gs://openhands-evaluation-results/eval-20600678832-claude-son_litellm_proxy-claude-sonnet-4-5-20250929_25-12-30-16-22.tar.gz -f reason="eval-only swebench reuse eval-20600678832 (fix)"
  • gh workflow run run-eval.yml --ref issue-236-output-jsonl -f benchmark=swebench -f eval_limit=1 -f sdk_ref=issue-236-output-jsonl -f benchmarks_branch=issue-236-output-jsonl -f eval_branch=issue-236-output-jsonl -f reason="infer+eval swebench smoke (pydantic 2.12)"
  • Evaluation Job run 20683422108 (success)

@openhands-ai
Copy link

openhands-ai bot commented Jan 3, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run Eval (SDK)

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #237 at branch `issue-236-output-jsonl`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants