refactor(gaia): use evaluation framework for max_retries parameter #268
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
This PR fixes issue #171 by making GAIA's
run_infer.pyuse the evaluation framework's parameter handling consistently withswebench/run_infer.pyandcommit0/run_infer.py.Changes
Added
max_retries=args.max_retriesto EvalMetadata: Previously, GAIA was not passing themax_retriesargument to the metadata, which meant the evaluation framework's retry logic for error handling was not using the user-provided value.Renamed confusing local variable: Renamed
max_retriestomax_event_sync_retriesin_extract_answer_from_historyto clarify that this is for WebSocket event synchronization (waiting for events to arrive), not for error retries (which are handled by the Evaluation base class).Why this matters
max_retriesparameter controls how many times the evaluation framework retries an instance when it throws an exceptionEvalMetadataFixes #171
@juanmichelini can click here to continue refining the PR