π Dataset: https://huggingface.co/datasets/aialt/MuBench
MuBench is a meta-dataset for evaluating the multilingual capabilities of large language models (LLMs) across 61 languages and 3.9M aligned samples.
It provides a unified framework to assess understanding, reasoning, factual knowledge, and truthfulness in both single-language and code-switched settings.
- 61 languages covering over 60% of the worldβs native speakers
- 12 core benchmarks across 6 ability dimensions
- Cross-lingual alignment ensuring one-to-one comparability across languages
- Code-switched variants for mixed-language evaluation
- Rigorous data pipeline including translation, back-translation, semantic and cultural validation
- Human evaluation of 34k samples across 17 languages
- New metric β Multilingual Consistency (MLC) for analyzing cross-lingual performance stability
| Category | Representative Datasets |
|---|---|
| Natural Language Understanding | SNLI, MultiNLI, WinoGrande |
| Commonsense Reasoning | HellaSwag, StoryCloze |
| Knowledge-based QA | MMLU, MMLU-Pro |
| Academic & Technical Reasoning | ARC-Easy, ARC-Challenge, GPQA |
| Factual Recall | BMLAMA |
| Truthfulness | TruthfulQA |
Each dataset file in MuBench follows the naming format:
{dataset}_{mode}_{lang}
where:
datasetβ {SNLIDataset,MNLIDataset,StoryClozeDataset,WinoGrandeDataset,MMLUDataset,MMLUProDataset,BMLAMADataset,HellaswagDataset,ARCEasyDataset,ARCChallengeDataset,GPQADataset}modespecifies the evaluation variant:en_templateβ English instruction prompt with localized content (improves model instruction-following consistency)local_templateβ Fully localized prompt and content in the target languagelightevalβ Reformatted for cloze-style evaluation harnessesmixβ Code-switched version mixing components from other languagesmix_lightevalβ Code-switched version in cloze format
For mix and mix_lighteval, the suffix _[int] denotes the maximum number of non-English languages introduced in each sample:
- Typically
_2for all datasets _8forbmlama, reflecting its multi-fact and high-entropy composition