You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -43,9 +43,9 @@ The current version supports importing the following expression parameters:
43
43
* OpenUtau Beta (or other versions with DiffSinger support)
44
44
* Python 3.10 \*
45
45
46
-
By default, this application uses [swift-f0](https://github.com/lars76/swift-f0) (based on ONNX Runtime) as the pitch extraction backend, which runs on CPU onlyand satisfies basic usage scenarios.
46
+
By default, this application uses [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx)as the pitch extraction backend, which runs on CPU only. [RMVPE](https://arxiv.org/abs/2306.15412v2) is currently the best-performing publicly available pitch extraction algorithm, and its inference speed is fast enough to satisfy the vast majority of use cases.
47
47
48
-
The classic [CREPE](https://github.com/marl/crepe) pitch extraction backend (depends on TensorFlow) is also available for scenarios with higher accuracy requirements. If your computer is equipped with an NVIDIA GPU and supports [CUDA 11.x](https://docs.nvidia.com/deploy/cuda-compatibility/minor-version-compatibility.html) (i.e., GPU driver version >= 450), the CREPE backend will automatically enable GPU acceleration.
48
+
The [swift-f0](https://github.com/lars76/swift-f0) and [CREPE](https://github.com/marl/crepe) pitch extraction backends are also available. The former runs on CPU only and is the fastest option, though its accuracy is modest. The latter is a classic algorithm in the field and runs more slowly. In a CUDA environment, the CREPE backend will automatically enable GPU acceleration.
49
49
50
50
> \* On Windows, TensorFlow 2.10 is the last version that supports GPU acceleration, and Python 3.10 is the highest Python version supported by its `.whl` files.
51
51
@@ -79,6 +79,7 @@ A new USTX file with expression parameters added. The original project will not
79
79
*[x] Linux support
80
80
*[x] NVIDIA GPU acceleration
81
81
*[x] Parameter config import/export
82
+
*[x] Expression curve visualization
82
83
*[x]`Pitch Deviation` generation
83
84
*[x]`Dynamics` generation
84
85
*[x]`Tension` generation
@@ -87,15 +88,15 @@ A new USTX file with expression parameters added. The original project will not
87
88
88
89
You can download pre-compiled executable files directly from the [Releases](https://github.com/NewComer00/expressive/releases) page:
GUI installer for Windows x64 architecture with GPU support.
99
+
Expressive CLI / GUI / Viewer installer for Windows x64 architecture with GPU support.
99
100
100
101
Includes CUDA runtime libraries. When used on a computer with an NVIDIA GPU (driver version >= 450), it significantly improves CREPE backend inference speed.
101
102
@@ -127,6 +128,8 @@ pip install -e ".[gpu,gui]"
127
128
128
129
After installation, you can use the `expressive` and `expressive-gui` entry points to run the **command-line interface** and **graphical user interface**.
129
130
131
+
You can also launch a standalone expression curve visualization tool via the `expressive-viewer` command to view and analyze expression curves extracted by `expressive` and `expressive-gui` in real time.
132
+
130
133
## 📖 Usage
131
134
132
135
> [!TIP]
@@ -143,6 +146,16 @@ After installation, you can use the `expressive` and `expressive-gui` entry poin
143
146
> LANGUAGE="en_US" expressive-gui
144
147
> ```
145
148
149
+
> [!IMPORTANT]
150
+
> For users who installed from source, when using the [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx) backend, the application will automatically download the model file [rmvpe.onnx (Copyright (c) 2022 lj1995 — MIT License)](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx) from Hugging Face.
151
+
>
152
+
> If you wish to download the model file in advance, you can run the following command after installation:
153
+
> ```bash
154
+
> rmvpe-onnx download
155
+
> ```
156
+
>
157
+
> If you installed the application via the installer, the model file is already included in the installation package, and no additional download is required.
158
+
146
159
### Command Line Interface (CLI)
147
160
148
161
Display help:
@@ -212,7 +225,7 @@ You can inspect the details of the expression curves in `expressive-viewer`, ana
212
225
213
226
The [`examples/` directory](examples/) contains several sample projects. You can import the `expressive_config.json` file from any example into the GUI to automatically populate all parameters with the preset values.
214
227
215
-
If you installed the application from the installer, a shortcut named `Expressive-examples` pointing to the examples directory will appear on your desktop after installation — you can import the config files directly from there.
228
+
If you installed the application from the installer, a shortcut named `Expressive Examples` pointing to the examples directory will appear on your desktop after installation — you can import the config files directly from there.
216
229
217
230
## 🔬 Algorithm Workflow
218
231
```mermaid
@@ -293,10 +306,7 @@ The extracted PITD expression curve is too flat, with almost no significant vari
293
306
The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
294
307
295
308
#### Solution
296
-
Try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
297
-
298
-
#### Future Plan
299
-
Introduce a better PITD backend (e.g., [RMVPE](https://github.com/Dream-High/RMVPE)).
309
+
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
300
310
301
311
### PITD expression curve has sudden jumps or spikes at certain positions
302
312
@@ -307,7 +317,4 @@ The PITD expression curve changes too rapidly at certain positions, with very la
307
317
The two confidence thresholds in the PITD extractor are set **too low**, causing erroneous detection results to be accepted.
308
318
309
319
#### Solution
310
-
Try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
311
-
312
-
#### Future Plan
313
-
Introduce a better PITD backend (e.g., [RMVPE](https://github.com/Dream-High/RMVPE)).
320
+
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
本应用默认选择 [swift-f0](https://github.com/lars76/swift-f0)(基于 ONNX Runtime)作为音高提取后端,仅需 CPU 即可运行,可满足基础使用场景。
46
+
本应用默认选择 [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx)作为音高提取后端,仅需 CPU 即可运行。[RMVPE](https://arxiv.org/abs/2306.15412v2) 是目前公开的效果最好的音高提取算法,且推理速度较快,可以满足绝大多数使用场景。
backend=Args(name="backend" , type=str , default="swift-f0", choices=list(backend_choices.keys()), help=_lf("**F0 detection backend** for extracting pitch from WAV files. Available options:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}"fork, vinPitdLoader.backend_choices.items()]))), # noqa: E501
36
+
backend=Args(name="backend" , type=str , default="rmvpe-onnx", choices=list(backend_choices.keys()), help=_lf("**F0 detection backend** for extracting pitch from WAV files. Available options:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}"fork, vinPitdLoader.backend_choices.items()]))), # noqa: E501
36
37
confidence_utau=Args(name="confidence_utau", type=float, default=None, help=_lf("Minimum **confidence level** for keeping detected pitch values in the **UTAU** WAV. Lower values retain more frames but may include errors. Omit to use the recommended value for the selected backend:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}"fork, vinPitdLoader.confidence_utau_recommended.items()]))), # noqa: E501
37
38
confidence_ref=Args(name="confidence_ref" , type=float, default=None, help=_lf("Minimum **confidence level** for keeping detected pitch values in the **reference** WAV. Lower values retain more frames but may include errors. Omit to use the recommended value for the selected backend:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}"fork, vinPitdLoader.confidence_ref_recommended.items()]))), # noqa: E501
38
39
align_radius=Args(name="align_radius" , type=int , default=1 , help=_l("**Radius** for the FastDTW alignment algorithm; larger values allow more flexible alignment but increase computation time")), # noqa: E501
0 commit comments