Can not reproduce the performance

Hi, Thanks for your interesting job，but I can not reproduce the model‘s performance in your report.
Here is the evaluation result of MMLU by llama factory.
```code
        Average: 64.98
           STEM: 54.97
Social Sciences: 74.88
     Humanities: 59.87
          Other: 72.33
```
The eval parameter is as follows：
```code
### model
model_name_or_path: /mnt/ai/open-o1-llama3.1-8B

### method
finetuning_type: full

### dataset
task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: llama3
lang: en

### output
save_dir: saves/open-o1-llama3-1-8B/eval

### eval
batch_size: 1
```
Could you provide your evaluation details？

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can not reproduce the performance #12

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Can not reproduce the performance #12

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions