Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Significant updates to agbench. #5313

Merged
merged 34 commits into from
Feb 7, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
6b7cc8e
Significant updates to agbench.
afourney Feb 1, 2025
cd065f6
Fixed some plubming issues.
afourney Feb 1, 2025
d15af5b
Fixed GAIA support.
afourney Feb 1, 2025
db6964b
Updated config to allow different models to be used with M1
afourney Feb 3, 2025
3dd0f5f
Use javascript to determine what text is visible in WebSurfer's viewp…
afourney Feb 3, 2025
f11c4ed
Merge remote-tracking branch 'origin/viewport_text' into update_agbench
afourney Feb 3, 2025
13c5445
Merge remote-tracking branch 'origin/main' into update_agbench
afourney Feb 4, 2025
bfede70
Added official GAIA scorer.
afourney Feb 5, 2025
3d73b0d
Prompting changes to better support smaller models.
afourney Feb 5, 2025
ae2c4b3
Merge branch 'main' into llama_surfer
afourney Feb 6, 2025
4fd3b17
Fixed summarization.
afourney Feb 6, 2025
b208068
Merge main.
afourney Feb 6, 2025
da3dcff
Merge remote-tracking branch 'origin/llama_surfer' into update_agbench
afourney Feb 6, 2025
f55c66b
Ensure decriptions appear each on one line. Fix web_surfer's descript…
afourney Feb 6, 2025
de13d57
Merge remote-tracking branch 'origin/fix_m1_descriptions' into llama_…
afourney Feb 6, 2025
946d845
Remove erroneous print
afourney Feb 6, 2025
6dd88ac
Various web surfer fixes.
afourney Feb 6, 2025
63ef5b3
Merge fixes.
afourney Feb 6, 2025
e32e6f7
Merge remote-tracking branch 'origin/llama_surfer' into update_agbench
afourney Feb 6, 2025
aa8f820
applied more patches
afourney Feb 6, 2025
3bf9c6f
Merged main.
afourney Feb 6, 2025
c3c4092
Merge remote-tracking branch 'origin/llama_surfer' into update_agbench
afourney Feb 6, 2025
5a09868
Merge fixes.
afourney Feb 6, 2025
6613b12
Merge main.
afourney Feb 7, 2025
fcd57c0
Added GroupChatSelector template.
afourney Feb 7, 2025
0caf62f
Removed older benchmarks that were missing Templates. These will retu…
afourney Feb 7, 2025
a710b7a
Missed a file.
afourney Feb 7, 2025
f022d7f
Merge branch 'main' into update_agbench
afourney Feb 7, 2025
c45d701
Fixed mypy.
afourney Feb 7, 2025
289b166
Updated READMEs.
afourney Feb 7, 2025
cc64309
Fixed pyright.
afourney Feb 7, 2025
83ac692
Trying to fix types again.
afourney Feb 7, 2025
462a975
Fixed formatting.
afourney Feb 7, 2025
7125e29
Merge branch 'main' into update_agbench
afourney Feb 7, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view

This file was deleted.

This file was deleted.

78 changes: 0 additions & 78 deletions python/packages/agbench/benchmarks/AssistantBench/README.md

This file was deleted.

This file was deleted.

Loading
Loading