Hi, Thanks for your interesting job,but I can not reproduce the model‘s performance in your report.
Here is the evaluation result of MMLU by llama factory.
Average: 64.98
STEM: 54.97
Social Sciences: 74.88
Humanities: 59.87
Other: 72.33
The eval parameter is as follows:
### model
model_name_or_path: /mnt/ai/open-o1-llama3.1-8B
### method
finetuning_type: full
### dataset
task: mmlu_test # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: llama3
lang: en
### output
save_dir: saves/open-o1-llama3-1-8B/eval
### eval
batch_size: 1
Could you provide your evaluation details?
Hi, Thanks for your interesting job,but I can not reproduce the model‘s performance in your report.
Here is the evaluation result of MMLU by llama factory.
The eval parameter is as follows:
Could you provide your evaluation details?