Update README.md (#11)

facebookresearch · Sep 27, 2024 · e4f4f79 · e4f4f79
1 parent 5b0497e
commit e4f4f79
Showing 1 changed file with 1 addition and 1 deletion.
diff --git a/projects/self_taught_evaluator/README.md b/projects/self_taught_evaluator/README.md
@@ -10,7 +10,7 @@ Instructions and materials presented here correspond to the [Self-Taught Evaluat
 
 We release the Self-Taught Evaluator model via the Hugging-Face model repo: https://huggingface.co/facebook/Self-taught-evaluator-llama3.1-70B. This model was trained using supervised fine-tuning (SFT) and direct preference optimization (DPO).
 
-First, the model was trained on data comprising responses and evalulation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.
+First, the model was trained on data comprising responses and evaluation plans generated by the seed model (see Section 3 in the paper). Next, the selected SFT model was used to generate higher quality evaluation plans for the preference finetuning dataset (see [section below](./README.md#synthetic-preference-data)). Finally, the released model was trained on preference finetuning data using the combination of DPO and NLL losses. The checkpoint selection was done using the pairwise judge accuracy computed over the Helpsteer2 validation set.
 
 ## Inference and Evaluation