๐ Official codebase for the ACL 2025 paper: REPRO-Bench: Can Agentic AI Systems Assess the Reproducibility of Social Science Papers?
The REPRO-Bench dataset is hosted on Hugging Face:
git clone https://huggingface.co/datasets/chuxuan/REPRO-Bench
cd REPRO-Bench
git lfs pullIt includes:
- 112 task instances (PDFs + code + data)
- Gold-standard reproducibility annotations
- Public reproduction reports
bash SWE-Agent/run_all.shbash AutoGPT/classic/original_autogpt/run_all.shbash CORE-Agent/classic/original_autogpt/run_all.sh