Nicholas EdwardsΒΉ*, Yukyung LeeΒ²*, Yujun (Audrey) MaoΒ², Yulu QinΒ², Sebastian SchusterΒΉΒ³β , Najoung KimΒ²β
ΒΉUniversity College London, Β²Boston University, Β³University of Vienna
*, β Equal contribution
Paper | Website | Dataset π€
Submit your agent here : Go submission page π
.
βββ instructions/ # Task-specific instructions (see list below)
β βββ checkeval/
β βββ cogs/
β βββ entity-tracking-multimodal/
β βββ explain-then-translate/
β βββ implicit-ins/
β βββ mission-impossible/
β βββ othello/
β βββ reasoning-or-reciting/
β βββ re-reading/
β βββ tree-of-thoughts/
β βββ varierr-nli/
β βββ winodict/
βββ process_instructions.py # Script for processing instructionsEach subdirectory inside instructions/ contains an instructions.md file that describes the task setting.
- checkeval
- cogs
- entity-tracking-multimodal
- implicit-ins
- mission-impossible
- othello
- reasoning-or-reciting
- re-reading
- tree-of-thoughts
- varierr-nli
- winodict
@article{edwards2025rex,
title={RExBench: Can coding agents autonomously implement AI research extensions?},
author={Edwards, Nicholas and Lee, Yukyung and Mao, Yujun (Audrey) and Qin, Yulu and Schuster, Sebastian and Kim, Najoung},
journal={arXiv preprint},
year={2025}
}Team RExBench ([email protected])