Skip to content

Commit

Permalink
Update README.md (#11)
Browse files Browse the repository at this point in the history
  • Loading branch information
uralik authored Sep 27, 2024
1 parent 5b0497e commit e4f4f79
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion projects/self_taught_evaluator/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Instructions and materials presented here correspond to the [Self-Taught Evaluat

We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model was trained using supervised fine-tuning (SFT) and direct preference optimization (DPO).

First, the model was trained on data comprising responses and evalulation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.
First, the model was trained on data comprising responses and evaluation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for the preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.

## Inference and Evaluation

Expand Down

0 comments on commit e4f4f79

Please sign in to comment.