Skip to content

tinlaboratory/RExBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

RExBench Title

Nicholas EdwardsΒΉ*, Yukyung LeeΒ²*, Yujun (Audrey) MaoΒ², Yulu QinΒ², Sebastian Schuster¹³†, Najoung Kim²†

ΒΉUniversity College London, Β²Boston University, Β³University of Vienna

*, † Equal contribution

Paper | Website | Dataset πŸ€—

πŸ“Š Submission Page

Submit your agent here : Go submission page πŸš€

πŸ“‚ Repository Structure

.
β”œβ”€β”€ instructions/            # Task-specific instructions (see list below)
β”‚   β”œβ”€β”€ checkeval/
β”‚   β”œβ”€β”€ cogs/
β”‚   β”œβ”€β”€ entity-tracking-multimodal/
β”‚   β”œβ”€β”€ explain-then-translate/
β”‚   β”œβ”€β”€ implicit-ins/
β”‚   β”œβ”€β”€ mission-impossible/
β”‚   β”œβ”€β”€ othello/
β”‚   β”œβ”€β”€ reasoning-or-reciting/
β”‚   β”œβ”€β”€ re-reading/
β”‚   β”œβ”€β”€ tree-of-thoughts/
β”‚   β”œβ”€β”€ varierr-nli/
β”‚   └── winodict/
└── process_instructions.py     # Script for processing instructions

Each subdirectory inside instructions/ contains an instructions.md file that describes the task setting.

βœ… Included Tasks

  • checkeval
  • cogs
  • entity-tracking-multimodal
  • implicit-ins
  • mission-impossible
  • othello
  • reasoning-or-reciting
  • re-reading
  • tree-of-thoughts
  • varierr-nli
  • winodict

🧠 Baseline Agents

  • Agent 1: aider (GitHub)
  • Agent 2: OpenHands (GitHub)
  • Agent 3: Claude Code

Citation

@article{edwards2025rex,
        title={RExBench: Can coding agents autonomously implement AI research extensions?},
        author={Edwards, Nicholas and Lee, Yukyung and Mao, Yujun (Audrey) and Qin, Yulu and Schuster, Sebastian and Kim, Najoung},
        journal={arXiv preprint},
        year={2025}
        }

Contact

Team RExBench ([email protected])

About

RExBench : Can coding agents autonomously implement AI research extensions?

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages