Skip to content

Latest commit

 

History

History
49 lines (39 loc) · 2.42 KB

File metadata and controls

49 lines (39 loc) · 2.42 KB

IndiMathBench is a fully human-verified benchmark for evaluating automated theorem provers on 312 competition-level mathematics Olympiad problems formalized in Lean 4. All problems are produced through a structured Human–AI collaborative pipeline and are fully human-verified. The dataset is sourced from the Indian National Mathematics Olympiad (INMO) and the Regional Mathematics Olympiad (RMO), two long-running annual math competitions in India.

The benchmark contributes 98 new Geometry problems and 45 new Combinatorics problems, two domains that are underrepresented in existing formal-reasoning benchmarks.

For submissions or questions, please contact parambiyani8@gmail.com. Feedback is welcome; please open an issue if you find errors or would like to suggest improvements.

Problem Domains

Category Count
Geometry 98
Algebra 92
Set Theory & Combinatorics 45
Number Theory 77

Evaluation Results

Model Scaffold Resolution
Claude Sonnet 4 ReAct 10 turns 4/312
Gemini 2.5 Pro ReAct 10 turns 12/312
GPT-5 ReAct 10 turns 36/312
DeepSeekProver V2 7B ReAct 10 turns 24/312
GodelProver V2 8B ReAct 10 turns 36/312

Detailed description of the 10 turn ReAct scaffold in our paper!

Citation

The associated paper for IndiMathBench is available here. Please consider citing if you find IndiMathBench useful:

@misc{biyani2025indimathbenchautoformalizingmathematicalreasoning,
      title={IndiMathBench: Autoformalizing Mathematical Reasoning Problems with a Human Touch}, 
      author={Param Biyani and Shashank Kirtania and Yasharth Bajpai and Sumit Gulwani and Ashish Tiwari},
      year={2025},
      eprint={2512.00997},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.00997}, 
}

License

This dataset is released under the Creative Commons Attribution 4.0 International (CC BY 4.0) license.