Skip to content

Conversation

@qbc2016
Copy link
Collaborator

@qbc2016 qbc2016 commented Apr 10, 2024

To assess the performance of our fine-tuned model, we leverage the Rouge-L metric and conduct experiments with a large number of clients, utilizing the Dolly-15K dataset as our training corpus.
The Dolly-15K dataset encompasses a total of 15,015 data points, distributed across eight distinct tasks. For a more comprehensive evaluation, we allocate the final task exclusively for evaluation purposes, while dedicating the remaining ones to the training phase.
Our experimental setup involves a network of 200 clients, utilizing a Dirichlet distribution for data partitioning to emulate non-IID conditions across the client base.

To do the evaluation, run

python federatescope/eval/eval_for_rougel/eval.py --cfg federatescope/llm/baselime/xxx.yaml

@qbc2016 qbc2016 requested a review from rayrayraykk April 10, 2024 02:39
@rayrayraykk rayrayraykk changed the title Add rougel for dolly Add rougel for cross-device evaluation Apr 10, 2024
@qbc2016 qbc2016 added ready_for_review [**IMPORTANT**] The pull request is ready for review, which will trigger the unit tests. FS-LLM Federated learning in LLM labels Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FS-LLM Federated learning in LLM ready_for_review [**IMPORTANT**] The pull request is ready for review, which will trigger the unit tests.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant