Skip to content

datajuicer/data-juicer-hub

Repository files navigation

Data-Juicer-Hub

Community-driven data-juicer recipes and best practices for various pre-training/fine-tuning tasks.

Documentation

Detail documentation about the recipes can be found here.

Quick Start

There are plenty of prepared recipes for data processing on different tasks. You can make use of them by cloning this repo and set the `--config`` with the local path of the target recipe file:

# clone this repo to somewhere on your local machine
git clone https://github.com/datajuicer/data-juicer-hub.git
# run with the actual local path to the target recipe
dj-process --config <root-of-data-juicer-hub>/demo/process.yaml --dataset_path <your-dataset-path>

Contributing

This is a community-driven repo, so feel free to upload your own recipes to this repo! 😄

About

Community-driven data-juicer recipes and best practices for various pre-training/fine-tuning tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •