|
| 1 | +documentation: | |
| 2 | + HiFiTTS-2 22kHz |
| 3 | + ############### |
| 4 | + |
| 5 | + This config can be used to download the audio data for |
| 6 | + `HiFiTTS-2 22kHz <https://huggingface.co/datasets/nvidia/hifitts-2>`_ |
| 7 | + |
| 8 | + 1. Downloads HiFiTTS-2 audio from LibriVox. |
| 9 | + 2. Outputs a new manifest in which LibriVox audiobook chapters which could not be downloaded (e.g. because they |
| 10 | + were removed from the website) are removed. |
| 11 | +
|
| 12 | + **Required arguments**. |
| 13 | +
|
| 14 | + * **workspace_dir**: specify the workspace folder where all audio files and manifests will be stored. |
| 15 | +
|
| 16 | + Note that you can customize any part of this config either directly or from command-line. |
| 17 | + |
| 18 | + **Output format**. |
| 19 | +
|
| 20 | + This config outputs 2 manifest files: |
| 21 | +
|
| 22 | + * ``${workspace_dir}/errors.json`` - entries from the input chapters file which failed to download from LibriVox. |
| 23 | + * ``${workspace_dir}/manifest_filtered_22khz`` - input manifest file without utterances from failed chapters. |
| 24 | +
|
| 25 | +processors_to_run: all |
| 26 | +workspace_dir: ??? |
| 27 | +manifest_filename: manifest_22khz.json |
| 28 | +output_filename: manifest_filtered_22khz.json |
| 29 | +chapter_filename: chapters_22khz.json |
| 30 | +error_filename: errors_22khz.json |
| 31 | +audio_dir_name: audio_22khz |
| 32 | +chapter_audio_dir_name: chapters |
| 33 | +sample_rate: 22050 |
| 34 | +delete_chapter_files: true |
| 35 | +exit_on_error: false |
| 36 | +use_dask: false |
| 37 | +max_workers: 8 |
| 38 | +chunksize: 50 |
| 39 | + |
| 40 | +input_manifest_file: ${workspace_dir}/${manifest_filename} |
| 41 | +chapter_file: ${workspace_dir}/${chapter_filename} |
| 42 | +error_file: ${workspace_dir}/${error_filename} |
| 43 | +audio_dir: ${workspace_dir}/${audio_dir_name} |
| 44 | +chapter_dir: ${workspace_dir}/${chapter_audio_dir_name} |
| 45 | +final_manifest: ${workspace_dir}/${output_filename} |
| 46 | + |
| 47 | +processors: |
| 48 | + - _target_: sdp.processors.DownloadHiFiTTS2 |
| 49 | + audio_dir: ${audio_dir} |
| 50 | + chapter_dir: ${chapter_dir} |
| 51 | + sample_rate: ${sample_rate} |
| 52 | + delete_chapter_files: ${delete_chapter_files} |
| 53 | + exit_on_error: ${exit_on_error} |
| 54 | + input_manifest_file: ${chapter_file} |
| 55 | + output_manifest_file: ${error_file} |
| 56 | + use_dask: ${use_dask} |
| 57 | + max_workers: ${max_workers} |
| 58 | + chunksize: ${chunksize} |
| 59 | + |
| 60 | + - _target_: sdp.processors.RemovedFailedChapters |
| 61 | + input_manifest_file: ${input_manifest_file} |
| 62 | + output_manifest_file: ${final_manifest} |
| 63 | + error_file: ${error_file} |
0 commit comments