You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> **It is strongly recommended to use v0.9.0 or later**. Earlier versions of the **PITD** expression parameter processing algorithm contain a [critical flaw](https://github.com/NewComer00/expressive/releases/tag/v0.9.0) that may result in **incorrect pitch curve generation**. To download the latest version, please visit the [Releases page](https://github.com/NewComer00/expressive/releases).
14
+
>
15
+
> For users migrating from an older version to `v0.9.0` or later, note that the default value of the **PITD Scaler** is now `1.0` (previously `2.0`). If you have an old configuration file, please set the **PITD Scaler** to `1.0`.
16
+
>
17
+
> **🎵 Thank you for using Expressive🎵**
18
+
10
19
# Expressive
11
20
12
21
**Expressive** is a [DiffSinger](https://github.com/openvpi/diffsinger) expression parameter importer developed for [OpenUtau](https://github.com/stakira/OpenUtau). It aims to extract expression parameters from real human vocals and import them into the appropriate tracks of your project.
@@ -21,18 +30,17 @@ The current version supports importing the following expression parameters:
> -*OpenUtau version from [keirokeer/OpenUtau-DiffSinger-Lunai](https://github.com/keirokeer/OpenUtau-DiffSinger-Lunai)*
29
-
> -*Singer model from [yousa-ling-official-production/yousa-ling-diffsinger-v1](https://github.com/yousa-ling-official-production/yousa-ling-diffsinger-v1)*
37
+
> -*Example from [`examples/明天会更好`](examples/明天会更好). Click to view details.*
30
38
31
39
> [!TIP]
32
40
> <details>
33
41
> <summary><b>👉 Click to expand the full voiced demo video 👈</b></summary>
@@ -47,6 +55,8 @@ By default, this application uses [rmvpe-onnx](https://github.com/newcomer00/rmv
47
55
48
56
The [swift-f0](https://github.com/lars76/swift-f0) and [CREPE](https://github.com/marl/crepe) pitch extraction backends are also available. The former runs on CPU only and is the fastest option, though its accuracy is modest. The latter is a classic algorithm in the field and runs more slowly. In a CUDA environment, the CREPE backend will automatically enable GPU acceleration.
49
57
58
+
There is also a newly added experimental **hybrid** backend available. The hybrid backend combines the prediction results of rmvpe-onnx and swift-f0, primarily using the pitch extraction results from rmvpe-onnx. In voiced segments of the audio, if the confidence of rmvpe-onnx is low and the confidence of swift-f0 is high, the result from swift-f0 is used for correction, improving the overall accuracy of pitch extraction.
59
+
50
60
> \* On Windows, TensorFlow 2.10 is the last version that supports GPU acceleration, and Python 3.10 is the highest Python version supported by its `.whl` files.
51
61
52
62
## 📌 Use Case
@@ -297,24 +307,53 @@ Relaunching the application should restore normal functionality, and this issue
297
307
#### Future Plan
298
308
The NiceGUI framework has begun improving its drag-and-drop support and should resolve this in a future release.
299
309
300
-
### PITD expression curve is overall too flat
310
+
---
311
+
312
+
### PITD expression curve is overly flat
301
313
302
314
#### Symptom
303
-
The extracted PITD expression curve is too flat, with almost no significant variation overall. Pitch changes in the reference vocal are not reflected in the expression curve.
304
315
305
-
#### Possible Cause
306
-
The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
316
+
The extracted PITD expression curve is too flat, with almost no significant variation. Pitch changes in the reference vocal are not properly reflected in the curve.
307
317
308
-
#### Solution
309
-
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
318
+
#### Possible Causes
319
+
320
+
1. In versions earlier than **v0.9.0**, there is an issue in the conversion between pitch and PITD values, which can cause the curve to appear overly flat.
321
+
2. The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
322
+
You can observe missing segments in the original pitch curve in [`expressive-viewer`](#iewer).
323
+
324
+
#### Solutions
310
325
311
-
### PITD expression curve has sudden jumps or spikes at certain positions
326
+
1. Please upgrade to **v0.9.0 or later**.
327
+
2. First try using the best-performing **rmvpe-onnx** or **hybrid** backend (with default confidence thresholds).
328
+
If the issue persists, try lowering both confidence thresholds. You can use the pitch confidence curve in [`expressive-viewer`](#viewer) as a reference when tuning.
329
+
In general, the **Utau vocal** is relatively clean, so it is recommended to adjust the confidence threshold for the **reference vocal** first.
330
+
331
+
#### Future Plans
332
+
Incorporate semantic information into the PITD expression extraction algorithm.
333
+
334
+
---
335
+
336
+
### PITD expression curve has sudden jumps or spikes
312
337
313
338
#### Symptom
314
-
The PITD expression curve changes too rapidly at certain positions, with very large jumps or spikes that clearly do not match natural vocal behavior.
315
339
316
-
#### Possible Cause
317
-
The two confidence thresholds in the PITD extractor are set **too low**, causing erroneous detection results to be accepted.
340
+
The PITD expression curve changes too abruptly at certain positions, with large jumps or spikes that do not match natural vocal behavior.
318
341
319
-
#### Solution
320
-
First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
342
+
#### Possible Causes
343
+
344
+
1. In versions earlier than **v0.9.0**, there is an issue in the conversion between pitch and PITD values.
345
+
2. There is noise in the reference audio around the corresponding timestamps.
346
+
You can observe abnormal spikes in the original pitch curve in [`expressive-viewer`](#viewer)
347
+
3. The two confidence thresholds in the PITD extractor are set **too low**, causing incorrect detections to be accepted.
348
+
This may also appear as spikes in the pitch curve in [`expressive-viewer`](#viewer).
349
+
350
+
#### Solutions
351
+
352
+
1. Please upgrade to **v0.9.0 or later**.
353
+
2. Try denoising the reference audio using tools such as [UVR](https://github.com/Anjok07/ultimatevocalremovergui) or [MSST](https://github.com/SUC-DriverOld/MSST-WebUI).
354
+
3. First try using the best-performing **rmvpe-onnx** or **hybrid** backend (with default confidence thresholds).
355
+
If the issue persists, try increasing both confidence thresholds. You can use the pitch confidence curve in [`expressive-viewer`](#viewer) to guide your adjustments.
356
+
In general, the **Utau vocal** is relatively clean, so it is recommended to adjust the confidence threshold for the **reference vocal** first.
357
+
358
+
#### Future Plans
359
+
Incorporate semantic information into the PITD expression extraction algorithm.
0 commit comments