Skip to content

Commit 86a152b

Browse files
committed
feat: add rmvpe-onnx as default pitch extraction backend
1 parent c088ce2 commit 86a152b

13 files changed

Lines changed: 245 additions & 154 deletions

File tree

.github/workflows/packaging.yml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,10 @@ jobs:
5353
pip install -e ".[${{ matrix.extras }}]"
5454
pip install pyinstaller
5555
56+
- name: Download rmvpe-onnx model
57+
run: |
58+
rmvpe-onnx download
59+
5660
- name: Build executable with PyInstaller
5761
run: |
5862
pyinstaller --noconfirm build/expressive.spec

README.en.md

Lines changed: 23 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ The current version supports importing the following expression parameters:
2121

2222
| **Working with OpenUtau** | **Data Viewer** |
2323
|:---:|:---:|
24-
| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/ef97aa6a-5938-42f1-bd4a-78f268109db8" width="100%" /> |
24+
| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
2525

2626
</div>
2727

@@ -43,9 +43,9 @@ The current version supports importing the following expression parameters:
4343
* OpenUtau Beta (or other versions with DiffSinger support)
4444
* Python 3.10 \*
4545

46-
By default, this application uses [swift-f0](https://github.com/lars76/swift-f0) (based on ONNX Runtime) as the pitch extraction backend, which runs on CPU only and satisfies basic usage scenarios.
46+
By default, this application uses [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx) as the pitch extraction backend, which runs on CPU only. [RMVPE](https://arxiv.org/abs/2306.15412v2) is currently the best-performing publicly available pitch extraction algorithm, and its inference speed is fast enough to satisfy the vast majority of use cases.
4747

48-
The classic [CREPE](https://github.com/marl/crepe) pitch extraction backend (depends on TensorFlow) is also available for scenarios with higher accuracy requirements. If your computer is equipped with an NVIDIA GPU and supports [CUDA 11.x](https://docs.nvidia.com/deploy/cuda-compatibility/minor-version-compatibility.html) (i.e., GPU driver version >= 450), the CREPE backend will automatically enable GPU acceleration.
48+
The [swift-f0](https://github.com/lars76/swift-f0) and [CREPE](https://github.com/marl/crepe) pitch extraction backends are also available. The former runs on CPU only and is the fastest option, though its accuracy is modest. The latter is a classic algorithm in the field and runs more slowly. In a CUDA environment, the CREPE backend will automatically enable GPU acceleration.
4949

5050
> \* On Windows, TensorFlow 2.10 is the last version that supports GPU acceleration, and Python 3.10 is the highest Python version supported by its `.whl` files.
5151
@@ -79,6 +79,7 @@ A new USTX file with expression parameters added. The original project will not
7979
* [x] Linux support
8080
* [x] NVIDIA GPU acceleration
8181
* [x] Parameter config import/export
82+
* [x] Expression curve visualization
8283
* [x] `Pitch Deviation` generation
8384
* [x] `Dynamics` generation
8485
* [x] `Tension` generation
@@ -87,15 +88,15 @@ A new USTX file with expression parameters added. The original project will not
8788

8889
You can download pre-compiled executable files directly from the [Releases](https://github.com/NewComer00/expressive/releases) page:
8990

90-
### `Expressive-GUI-<version>-Windows-x64-CPU.exe`
91+
### `Expressive-<version>-Windows-x64-CPU.exe`
9192

92-
GUI installer for Windows x64 architecture.
93+
Expressive CLI / GUI / Viewer installer for Windows x64 architecture.
9394

9495
CPU-only, no CUDA runtime libraries included. Small installation size, but slower when using the CREPE backend for pitch extraction.
9596

96-
### `Expressive-GUI-<version>-Windows-x64-GPU.exe`
97+
### `Expressive-<version>-Windows-x64-GPU.exe`
9798

98-
GUI installer for Windows x64 architecture with GPU support.
99+
Expressive CLI / GUI / Viewer installer for Windows x64 architecture with GPU support.
99100

100101
Includes CUDA runtime libraries. When used on a computer with an NVIDIA GPU (driver version >= 450), it significantly improves CREPE backend inference speed.
101102

@@ -127,6 +128,8 @@ pip install -e ".[gpu,gui]"
127128
128129
After installation, you can use the `expressive` and `expressive-gui` entry points to run the **command-line interface** and **graphical user interface**.
129130

131+
You can also launch a standalone expression curve visualization tool via the `expressive-viewer` command to view and analyze expression curves extracted by `expressive` and `expressive-gui` in real time.
132+
130133
## 📖 Usage
131134

132135
> [!TIP]
@@ -143,6 +146,16 @@ After installation, you can use the `expressive` and `expressive-gui` entry poin
143146
> LANGUAGE="en_US" expressive-gui
144147
> ```
145148
149+
> [!IMPORTANT]
150+
> For users who installed from source, when using the [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx) backend, the application will automatically download the model file [rmvpe.onnx (Copyright (c) 2022 lj1995 — MIT License)](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx) from Hugging Face.
151+
>
152+
> If you wish to download the model file in advance, you can run the following command after installation:
153+
> ```bash
154+
> rmvpe-onnx download
155+
> ```
156+
>
157+
> If you installed the application via the installer, the model file is already included in the installation package, and no additional download is required.
158+
146159
### Command Line Interface (CLI)
147160
148161
Display help:
@@ -212,7 +225,7 @@ You can inspect the details of the expression curves in `expressive-viewer`, ana
212225

213226
The [`examples/` directory](examples/) contains several sample projects. You can import the `expressive_config.json` file from any example into the GUI to automatically populate all parameters with the preset values.
214227

215-
If you installed the application from the installer, a shortcut named `Expressive-examples` pointing to the examples directory will appear on your desktop after installation — you can import the config files directly from there.
228+
If you installed the application from the installer, a shortcut named `Expressive Examples` pointing to the examples directory will appear on your desktop after installation — you can import the config files directly from there.
216229

217230
## 🔬 Algorithm Workflow
218231
```mermaid
@@ -293,10 +306,7 @@ The extracted PITD expression curve is too flat, with almost no significant vari
293306
The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
294307

295308
#### Solution
296-
Try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
297-
298-
#### Future Plan
299-
Introduce a better PITD backend (e.g., [RMVPE](https://github.com/Dream-High/RMVPE)).
309+
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
300310

301311
### PITD expression curve has sudden jumps or spikes at certain positions
302312

@@ -307,7 +317,4 @@ The PITD expression curve changes too rapidly at certain positions, with very la
307317
The two confidence thresholds in the PITD extractor are set **too low**, causing erroneous detection results to be accepted.
308318

309319
#### Solution
310-
Try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
311-
312-
#### Future Plan
313-
Introduce a better PITD backend (e.g., [RMVPE](https://github.com/Dream-High/RMVPE)).
320+
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.

README.md

Lines changed: 16 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121

2222
| **工作流程** | **数据可视化** |
2323
|:---:|:---:|
24-
| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/ef97aa6a-5938-42f1-bd4a-78f268109db8" width="100%" /> |
24+
| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
2525

2626
</div>
2727

@@ -43,9 +43,9 @@
4343
* OpenUtau Beta(或支持 DiffSinger 的其他版本)
4444
* Python 3.10 \*
4545

46-
本应用默认选择 [swift-f0](https://github.com/lars76/swift-f0)(基于 ONNX Runtime)作为音高提取后端,仅需 CPU 即可运行,可满足基础使用场景
46+
本应用默认选择 [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx) 作为音高提取后端,仅需 CPU 即可运行[RMVPE](https://arxiv.org/abs/2306.15412v2) 是目前公开的效果最好的音高提取算法,且推理速度较快,可以满足绝大多数使用场景
4747

48-
也提供了经典的 [CREPE](https://github.com/marl/crepe)(依赖 TensorFlow)音高提取后端,适合更高要求的使用场景。若您的电脑配有 NVIDIA 显卡且支持 [CUDA 11.x](https://docs.nvidia.com/deploy/cuda-compatibility/minor-version-compatibility.html)(即显卡驱动版本 >= 450),使用 CREPE 后端时会自动启用 GPU 加速。
48+
应用也提供了 [swift-f0](https://github.com/lars76/swift-f0)[CREPE](https://github.com/marl/crepe) 音高提取后端。前者仅依赖 CPU,效果一般,但速度最快。后者是业内的经典算法,速度较慢。在 CUDA 环境下,CREPE 后端会自动启用 GPU 加速。
4949

5050
> \* 在 Windows 平台下,TensorFlow 2.10 是最后一个支持 GPU 加速的版本,Python 3.10 是它的 `.whl` 文件支持的最高 Python 版本。
5151
@@ -146,6 +146,16 @@ pip install -e ".[gpu,gui]"
146146
> LANGUAGE="en_US" expressive-gui
147147
> ```
148148
149+
> [!IMPORTANT]
150+
> 从源码安装的用户在运行 [rmvpe-onnx](https://github.com/newcomer00/rmvpe-onnx) 后端时,应用会自动从 Hugging Face 下载模型文件 [rmvpe.onnx(Copyright (c) 2022 lj1995 — MIT License)](https://huggingface.co/lj1995/VoiceConversionWebUI/blob/main/rmvpe.onnx)。
151+
>
152+
> 如果您希望提前下载模型文件,可在安装完成后运行以下命令:
153+
> ```bash
154+
> rmvpe-onnx download
155+
> ```
156+
>
157+
> 若您是通过安装包获取的本应用,安装包中已包含该模型文件,无需额外下载。
158+
149159
### 命令行界面(CLI)
150160
151161
显示帮助信息
@@ -220,7 +230,7 @@ expressive-viewer
220230

221231
项目的 [`examples/` 目录](examples/)下存放有多个示例。您可以在图形用户界面中导入相应示例的 `expressive_config.json` 配置文件,将预设的参数一键填写到应用中。
222232

223-
若您是从安装包获取的本应用,安装完毕后示例目录的快捷方式 `Expressive-examples` 将出现在您的桌面,您也可以直接导入其中的配置文件。
233+
若您是从安装包获取的本应用,安装完毕后示例目录的快捷方式 `Expressive Examples` 将出现在您的桌面,您也可以直接导入其中的配置文件。
224234

225235
## 🔬 算法流程
226236
```mermaid
@@ -301,10 +311,7 @@ NiceGUI 框架已经开始着手改进文件拖拽支持,应该在未来的版
301311
PITD 表情提取器中,两个置信度阈值设置**过高**,许多音高变化没有被采信。
302312

303313
#### 解决方案
304-
尝试降低两个置信度阈值。一般来说,**歌姬音声**比较纯净,可以先调整**参考人声**的置信度阈值。
305-
306-
#### 未来计划
307-
引入更好的 PITD 后端(如 [RMVPE](https://github.com/Dream-High/RMVPE))。
314+
请先尝试使用效果最好的 rmvpe-onnx 后端(默认置信度阈值)。若问题仍在,尝试降低两个置信度阈值。一般来说,**歌姬音声**比较纯净,可以先调整**参考人声**的置信度阈值。
308315

309316
### PITD 表情曲线在某些位置变化过快,出现跳跃或毛刺
310317

@@ -315,7 +322,4 @@ PITD 表情曲线在某些位置变化过快,出现非常大的跳跃或毛刺
315322
PITD 表情提取器中,两个置信度阈值设置**过低**,错误的识别结果被采信。
316323

317324
#### 解决方案
318-
尝试增加两个置信度阈值。一般来说,**歌姬音声**比较纯净,可以先调整**参考人声**的置信度阈值。
319-
320-
#### 未来计划
321-
引入更好的 PITD 后端(如 [RMVPE](https://github.com/Dream-High/RMVPE))。
325+
请先尝试使用效果最好的 rmvpe-onnx 后端(默认置信度阈值)。若问题仍在,尝试增加两个置信度阈值。一般来说,**歌姬音声**比较纯净,可以先调整**参考人声**的置信度阈值。

examples/Прекрасное Далеко/expressive_config.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818
},
1919
"pitd": {
2020
"selected": true,
21-
"backend": "crepe",
22-
"confidence_utau": 0.8,
23-
"confidence_ref": 0.6,
21+
"backend": "rmvpe-onnx",
22+
"confidence_utau": null,
23+
"confidence_ref": null,
2424
"align_radius": 1,
2525
"semitone_shift": 0,
2626
"smoothness": 4,
@@ -35,4 +35,4 @@
3535
"bias": 10
3636
}
3737
}
38-
}
38+
}

examples/テトリス/expressive_config.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818
},
1919
"pitd": {
2020
"selected": true,
21-
"backend": "swift-f0",
22-
"confidence_utau": 0.85,
23-
"confidence_ref": 0.9,
21+
"backend": "rmvpe-onnx",
22+
"confidence_utau": null,
23+
"confidence_ref": null,
2424
"align_radius": 1,
2525
"semitone_shift": 0,
2626
"smoothness": 2,
@@ -35,4 +35,4 @@
3535
"bias": 10
3636
}
3737
}
38-
}
38+
}

examples/明天会更好/expressive_config.json

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,9 +18,9 @@
1818
},
1919
"pitd": {
2020
"selected": true,
21-
"backend": "swift-f0",
22-
"confidence_utau": 0.9,
23-
"confidence_ref": 0.93,
21+
"backend": "rmvpe-onnx",
22+
"confidence_utau": null,
23+
"confidence_ref": null,
2424
"align_radius": 1,
2525
"semitone_shift": 0,
2626
"smoothness": 2,
@@ -35,4 +35,4 @@
3535
"bias": 10
3636
}
3737
}
38-
}
38+
}

expressions/pitd.py

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,14 @@ class PitdLoader(ExpressionLoader):
2626
expression_name = "pitd"
2727
expression_info = _l("Pitch Deviation (curve)")
2828
backend_choices = {
29-
"swift-f0": _l("fast, CPU-based (ONNX Runtime)"),
30-
"crepe": _l("classic but slow, CPU & NVIDIA GPU (TensorFlow)"),
29+
"rmvpe-onnx": _l("finest accuracy, fast, CPU only (ONNX Runtime)"),
30+
"swift-f0": _l("fair accuracy, fastest, CPU only (ONNX Runtime)"),
31+
"crepe": _l("good accuracy, slow, CPU & NVIDIA GPU (TensorFlow)"),
3132
}
32-
confidence_utau_recommended = {"swift-f0": 0.95, "crepe": 0.8}
33-
confidence_ref_recommended = {"swift-f0": 0.93, "crepe": 0.6}
33+
confidence_utau_recommended = {"rmvpe-onnx": 0.03, "swift-f0": 0.95, "crepe": 0.80}
34+
confidence_ref_recommended = {"rmvpe-onnx": 0.03, "swift-f0": 0.93, "crepe": 0.60}
3435
args = SimpleNamespace(
35-
backend = Args(name="backend" , type=str , default="swift-f0", choices=list(backend_choices.keys()), help=_lf("**F0 detection backend** for extracting pitch from WAV files. Available options:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}" for k, v in PitdLoader.backend_choices.items()]))), # noqa: E501
36+
backend = Args(name="backend" , type=str , default="rmvpe-onnx", choices=list(backend_choices.keys()), help=_lf("**F0 detection backend** for extracting pitch from WAV files. Available options:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}" for k, v in PitdLoader.backend_choices.items()]))), # noqa: E501
3637
confidence_utau = Args(name="confidence_utau", type=float, default=None, help=_lf("Minimum **confidence level** for keeping detected pitch values in the **UTAU** WAV. Lower values retain more frames but may include errors. Omit to use the recommended value for the selected backend:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}" for k, v in PitdLoader.confidence_utau_recommended.items()]))), # noqa: E501
3738
confidence_ref = Args(name="confidence_ref" , type=float, default=None, help=_lf("Minimum **confidence level** for keeping detected pitch values in the **reference** WAV. Lower values retain more frames but may include errors. Omit to use the recommended value for the selected backend:\n\n%s\n\n", lambda: "\n".join([f"- `{k}`: {v}" for k, v in PitdLoader.confidence_ref_recommended.items()]))), # noqa: E501
3839
align_radius = Args(name="align_radius" , type=int , default=1 , help=_l("**Radius** for the FastDTW alignment algorithm; larger values allow more flexible alignment but increase computation time")), # noqa: E501
@@ -114,12 +115,12 @@ def get_expression(
114115
return self.expression_tick, self.expression_val
115116

116117

117-
def get_wav_features(wav_path, backend="swift-f0", confidence_threshold=0.8, confidence_filter_size=9):
118+
def get_wav_features(wav_path, backend="rmvpe-onnx", confidence_threshold=0.8, confidence_filter_size=9):
118119
"""Extract features from a WAV file.
119120
120121
Args:
121122
wav_path (str): Path to the WAV file.
122-
backend (str, optional): F0 detection backend ("crepe" or "swift-f0"). Defaults to "swift-f0".
123+
backend (str, optional): F0 detection backend ("crepe" or "swift-f0" or "rmvpe-onnx"). Defaults to "rmvpe-onnx".
123124
confidence_threshold (float, optional): Confidence threshold for pitch detection. Defaults to 0.8.
124125
confidence_filter_size (int, optional): Size of the median filter for confidence. Defaults to 9.
125126

0 commit comments

Comments
 (0)