GitHub - tinlaboratory/RExBench: RExBench : Can coding agents autonomously implement AI research extensions?

Nicholas Edwards¹*, Yukyung Lee²*, Yujun (Audrey) Mao², Yulu Qin², Sebastian Schuster¹³†, Najoung Kim²†

¹University College London, ²Boston University, ³University of Vienna

*, † Equal contribution

Paper | Website | Dataset 🤗

📊 Submission Page

Submit your agent here : Go submission page 🚀

📂 Repository Structure

.
├── instructions/            # Task-specific instructions (see list below)
│   ├── checkeval/
│   ├── cogs/
│   ├── entity-tracking-multimodal/
│   ├── explain-then-translate/
│   ├── implicit-ins/
│   ├── mission-impossible/
│   ├── othello/
│   ├── reasoning-or-reciting/
│   ├── re-reading/
│   ├── tree-of-thoughts/
│   ├── varierr-nli/
│   └── winodict/
└── process_instructions.py     # Script for processing instructions

Each subdirectory inside instructions/ contains an instructions.md file that describes the task setting.

✅ Included Tasks

checkeval
cogs
entity-tracking-multimodal
implicit-ins
mission-impossible
othello
reasoning-or-reciting
re-reading
tree-of-thoughts
varierr-nli
winodict

🧠 Baseline Agents

Agent 1: aider (GitHub)
Agent 2: OpenHands (GitHub)
Agent 3: Claude Code

Citation

@article{edwards2025rex,
        title={RExBench: Can coding agents autonomously implement AI research extensions?},
        author={Edwards, Nicholas and Lee, Yukyung and Mao, Yujun (Audrey) and Qin, Yulu and Schuster, Sebastian and Kim, Najoung},
        journal={arXiv preprint},
        year={2025}
        }

Contact

Team RExBench ([email protected])

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
instructions		instructions
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
process_instructions.py		process_instructions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📊 Submission Page

📂 Repository Structure

✅ Included Tasks

🧠 Baseline Agents

Citation

Contact

About

Uh oh!

Releases

Packages

Languages

License

tinlaboratory/RExBench

Folders and files

Latest commit

History

Repository files navigation

📊 Submission Page

📂 Repository Structure

✅ Included Tasks

🧠 Baseline Agents

Citation

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages