Skip to content

Can not reproduce the performance #12

@shiokoo

Description

@shiokoo

Hi, Thanks for your interesting job,but I can not reproduce the model‘s performance in your report.
Here is the evaluation result of MMLU by llama factory.

        Average: 64.98
           STEM: 54.97
Social Sciences: 74.88
     Humanities: 59.87
          Other: 72.33

The eval parameter is as follows:

### model
model_name_or_path: /mnt/ai/open-o1-llama3.1-8B

### method
finetuning_type: full

### dataset
task: mmlu_test  # choices: [mmlu_test, ceval_validation, cmmlu_test]
template: llama3
lang: en

### output
save_dir: saves/open-o1-llama3-1-8B/eval

### eval
batch_size: 1

Could you provide your evaluation details?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions