- Install the WaveNet dependencies.
- Download and unzip this repo.
- Download, unzip, and move the tensorflow-wavenet-master folder into this repo's folder under the name 'tensorflow-wavenet'.
- Make a new folder for a sound in this repo's folder.
- Make a folder in that folder called 'corpus'.
- Put 16kHz mono wav files of the sound in the corpus folder.
- Open a shell active in this repo's directory.
- Run
python gen.py sound-folder-name-hereto train indefinitely, generating three 10-second examples every 10000 steps insound-folder-name-here/gen/training-step-here/. - Interrupt whenever you want to stop, then run the command again to pick up where you left off or repeat the steps for a different sound.
If you aren't using Windows with an NVIDIA GPU, see here, and then $ pip install librosa. Otherwise, follow the instructions below.
This guide worked for Windows 10 64-bit as of May 2017.
- Make sure your GPU drivers are up to date.
- Download and run the CUDA installer.
- Download CUDNN v5.1 for CUDA 8.0 (you have to make an account).
- Move
cuda\bin\cudnn64_5.dlltowhere-you-installed-CUDA\v8.0\bin\. - Move
cuda\include\cudnn.htowhere-you-installed-CUDA\v8.0\include\. - Move
cuda\lib\x64\cudnn.libtowhere-you-installed-CUDA\v8.0\lib\x64\. - Add
where-you-installed-CUDA\v8.0\bin\to the PATH environment variable. - Install python 3.5.
- Download the latest prebuilt numpy+mkl and scipy wheels for Windows 64-bit CPython 3.5.
- With a shell open to the folder you downloaded those wheels, install them with the following command:
$ pip install numpy‑1.13.0rc2+mkl‑cp35‑cp35m‑win_amd64.whl scipy‑0.19.0‑cp35‑cp35m‑win_amd64.whl - Install tensorflow-gpu.
$ pip install tensorflow-gpu - Make sure you have the Visual Studio Visual C++ Build Tools installed.
- Install resampy from source following the instructions under "Advanced users and developers [...]".
- Install librosa.
$ pip install librosa
- File -> Import -> Audio... your original file.
- Project Rate (Hz) (in the bottom left corner): 16000
- If stereo, click on the down arrow by the name of the track, then Split Stereo to Mono.
- File -> Export...
- Format: WAV (Microsoft) signed 16 bit PCM
$ ffmpeg -i input-file-here.mp3 -ar 16000 -ac 1 output-file-here.wav