NewComer00
diff --git a/‎README.en.md‎
Lines changed: 55 additions & 16 deletions b/‎README.en.md‎
Lines changed: 55 additions & 16 deletions
diff --git a/‎README.md‎
Lines changed: 34 additions & 8 deletions b/‎README.md‎
Lines changed: 34 additions & 8 deletions
diff --git a/‎examples/Прекрасное Далеко/expressive_config.json‎
Lines changed: 4 additions & 4 deletions b/‎examples/Прекрасное Далеко/expressive_config.json‎
Lines changed: 4 additions & 4 deletions
diff --git a/‎examples/テトリス/expressive_config.json‎
Lines changed: 2 additions & 2 deletions b/‎examples/テトリス/expressive_config.json‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎examples/明天会更好/expressive_config.json‎
Lines changed: 3 additions & 3 deletions b/‎examples/明天会更好/expressive_config.json‎
Lines changed: 3 additions & 3 deletions
@@ -7,6 +7,15 @@
   <a href="README.en.md"><img src="https://img.shields.io/badge/lang-English-blue.svg"></a>
 </p>
 
+> [!WARNING]
+> 🚨 **Please Read Before Downloading** 🚨
+>
+> **It is strongly recommended to use v0.9.0 or later**. Earlier versions of the **PITD** expression parameter processing algorithm contain a [critical flaw](https://github.com/NewComer00/expressive/releases/tag/v0.9.0) that may result in **incorrect pitch curve generation**. To download the latest version, please visit the [Releases page](https://github.com/NewComer00/expressive/releases).
+>
+> For users migrating from an older version to `v0.9.0` or later, note that the default value of the **PITD Scaler** is now `1.0` (previously `2.0`). If you have an old configuration file, please set the **PITD Scaler** to `1.0`.
+>
+> **🎵 Thank you for using Expressive🎵**
+
 # Expressive
 
 **Expressive** is a [DiffSinger](https://github.com/openvpi/diffsinger) expression parameter importer developed for [OpenUtau](https://github.com/stakira/OpenUtau). It aims to extract expression parameters from real human vocals and import them into the appropriate tracks of your project.
@@ -21,18 +30,17 @@ The current version supports importing the following expression parameters:
 
 | **Working with OpenUtau** | **Data Viewer** |
 |:---:|:---:|
-| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
+| <img src="https://github.com/user-attachments/assets/d4e37337-50df-4d7d-8552-c4505dc73f20" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
 
 </div>
 
-> - *OpenUtau version from [keirokeer/OpenUtau-DiffSinger-Lunai](https://github.com/keirokeer/OpenUtau-DiffSinger-Lunai)*
-> - *Singer model from [yousa-ling-official-production/yousa-ling-diffsinger-v1](https://github.com/yousa-ling-official-production/yousa-ling-diffsinger-v1)*
+> - *Example from [`examples/明天会更好`](examples/明天会更好). Click to view details.*
 
 > [!TIP]
 > <details>
 >   <summary><b>👉 Click to expand the full voiced demo video 👈</b></summary>
 >
->   <p align="center"><video src="https://github.com/user-attachments/assets/4b5b7c15-947a-4f54-b80e-a14a9eefc86b"></video></p>
+>   <p align="center"><video src="https://github.com/user-attachments/assets/89706eec-63f6-44f6-8ed7-1f1c73cb341e"></video></p>
 >   <p align="center"><video src="https://github.com/user-attachments/assets/4076eb8b-07eb-48e6-bdec-4abeac6258c7"></video></p>
 >
 > </details>
@@ -47,6 +55,8 @@ By default, this application uses [rmvpe-onnx](https://github.com/newcomer00/rmv
 
 The [swift-f0](https://github.com/lars76/swift-f0) and [CREPE](https://github.com/marl/crepe) pitch extraction backends are also available. The former runs on CPU only and is the fastest option, though its accuracy is modest. The latter is a classic algorithm in the field and runs more slowly. In a CUDA environment, the CREPE backend will automatically enable GPU acceleration.
 
+There is also a newly added experimental **hybrid** backend available. The hybrid backend combines the prediction results of rmvpe-onnx and swift-f0, primarily using the pitch extraction results from rmvpe-onnx. In voiced segments of the audio, if the confidence of rmvpe-onnx is low and the confidence of swift-f0 is high, the result from swift-f0 is used for correction, improving the overall accuracy of pitch extraction.
+
 > \* On Windows, TensorFlow 2.10 is the last version that supports GPU acceleration, and Python 3.10 is the highest Python version supported by its `.whl` files.
 
 ## 📌 Use Case
@@ -297,24 +307,53 @@ Relaunching the application should restore normal functionality, and this issue
 #### Future Plan
 The NiceGUI framework has begun improving its drag-and-drop support and should resolve this in a future release.
 
-### PITD expression curve is overall too flat
+---
+
+### PITD expression curve is overly flat
 
 #### Symptom
-The extracted PITD expression curve is too flat, with almost no significant variation overall. Pitch changes in the reference vocal are not reflected in the expression curve.
 
-#### Possible Cause
-The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
+The extracted PITD expression curve is too flat, with almost no significant variation. Pitch changes in the reference vocal are not properly reflected in the curve.
 
-#### Solution
-First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try lowering both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
+#### Possible Causes
+
+1. In versions earlier than **v0.9.0**, there is an issue in the conversion between pitch and PITD values, which can cause the curve to appear overly flat.
+2. The two confidence thresholds in the PITD extractor are set **too high**, causing many pitch changes to be discarded.
+   You can observe missing segments in the original pitch curve in [`expressive-viewer`](#iewer).
+
+#### Solutions
 
-### PITD expression curve has sudden jumps or spikes at certain positions
+1. Please upgrade to **v0.9.0 or later**.
+2. First try using the best-performing **rmvpe-onnx** or **hybrid** backend (with default confidence thresholds).
+   If the issue persists, try lowering both confidence thresholds. You can use the pitch confidence curve in [`expressive-viewer`](#viewer) as a reference when tuning.
+   In general, the **Utau vocal** is relatively clean, so it is recommended to adjust the confidence threshold for the **reference vocal** first.
+
+#### Future Plans
+Incorporate semantic information into the PITD expression extraction algorithm.
+
+---
+
+### PITD expression curve has sudden jumps or spikes
 
 #### Symptom
-The PITD expression curve changes too rapidly at certain positions, with very large jumps or spikes that clearly do not match natural vocal behavior.
 
-#### Possible Cause
-The two confidence thresholds in the PITD extractor are set **too low**, causing erroneous detection results to be accepted.
+The PITD expression curve changes too abruptly at certain positions, with large jumps or spikes that do not match natural vocal behavior.
 
-#### Solution
-First try using the best-performing rmvpe-onnx backend (with default confidence thresholds). If the issue persists, try increasing both confidence thresholds. In general, the **Utau vocal** is relatively clean, so it is advisable to first adjust the confidence threshold for the **Reference vocal**.
+#### Possible Causes
+
+1. In versions earlier than **v0.9.0**, there is an issue in the conversion between pitch and PITD values.
+2. There is noise in the reference audio around the corresponding timestamps.
+   You can observe abnormal spikes in the original pitch curve in [`expressive-viewer`](#viewer)
+3. The two confidence thresholds in the PITD extractor are set **too low**, causing incorrect detections to be accepted.
+   This may also appear as spikes in the pitch curve in [`expressive-viewer`](#viewer).
+
+#### Solutions
+
+1. Please upgrade to **v0.9.0 or later**.
+2. Try denoising the reference audio using tools such as [UVR](https://github.com/Anjok07/ultimatevocalremovergui) or [MSST](https://github.com/SUC-DriverOld/MSST-WebUI).
+3. First try using the best-performing **rmvpe-onnx** or **hybrid** backend (with default confidence thresholds).
+   If the issue persists, try increasing both confidence thresholds. You can use the pitch confidence curve in [`expressive-viewer`](#viewer) to guide your adjustments.
+   In general, the **Utau vocal** is relatively clean, so it is recommended to adjust the confidence threshold for the **reference vocal** first.
+
+#### Future Plans
+Incorporate semantic information into the PITD expression extraction algorithm.
@@ -7,6 +7,15 @@
   <a href="README.en.md"><img src="https://img.shields.io/badge/lang-English-blue.svg"></a>
 </p>
 
+> [!WARNING]
+> 🚨 **下载前请注意** 🚨
+> 
+> **强烈建议您使用 `v0.9.0` 及以上版本**。早先版本的 **PITD** 表情参数处理算法存在[严重缺陷](https://github.com/NewComer00/expressive/releases/tag/v0.9.0)，会导致**音高曲线绘制错误**。下载最新版本请前往 [Releases 页面](https://github.com/NewComer00/expressive/releases)。
+>
+> 对于从旧版本迁移到 `v0.9.0` 及以上版本的用户，新版本中 **PITD 缩放因子（Scaler）的默认值为 `1.0`**，不再是原来的 `2.0`。若您有旧版本的配置文件，请将 **PITD 缩放因子（Scaler）设置为 `1.0`**。
+>
+> **🎵 感谢您使用 Expressive🎵**
+
 # Expressive
 
 **Expressive** 是一个为 [OpenUtau](https://github.com/stakira/OpenUtau) 开发的 [DiffSinger](https://github.com/openvpi/diffsinger) 表情参数导入工具，旨在从真实人声中提取表情参数，并导入至工程的相应轨道。
@@ -21,18 +30,17 @@
 
 | **工作流程** | **数据可视化** |
 |:---:|:---:|
-| <img src="https://github.com/user-attachments/assets/268b44d4-528d-481e-acfb-3f7da7261c80" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
+| <img src="https://github.com/user-attachments/assets/d4e37337-50df-4d7d-8552-c4505dc73f20" width="100%" /> | <img src="https://github.com/user-attachments/assets/91ddadee-62cd-4420-abf0-dd9177e8f935" width="100%" /> |
 
 </div>
 
-> - *OpenUtau 版本来自 [keirokeer/OpenUtau-DiffSinger-Lunai](https://github.com/keirokeer/OpenUtau-DiffSinger-Lunai)*
-> - *歌手模型来自 [yousa-ling-official-production/yousa-ling-diffsinger-v1](https://github.com/yousa-ling-official-production/yousa-ling-diffsinger-v1)*
+> - *示例来自 [`examples/明天会更好`](examples/明天会更好)，点击查看详情信息*
 
 > [!TIP]
 > <details>
 >   <summary><b>👉 点击展开完整有声演示视频 👈</b></summary>
 >
->   <p align="center"><video src="https://github.com/user-attachments/assets/4b5b7c15-947a-4f54-b80e-a14a9eefc86b"></video></p>
+>   <p align="center"><video src="https://github.com/user-attachments/assets/89706eec-63f6-44f6-8ed7-1f1c73cb341e"></video></p>
 >   <p align="center"><video src="https://github.com/user-attachments/assets/4076eb8b-07eb-48e6-bdec-4abeac6258c7"></video></p>
 > 
 > </details>
@@ -47,6 +55,8 @@
 
 应用也提供了 [swift-f0](https://github.com/lars76/swift-f0) 与 [CREPE](https://github.com/marl/crepe) 音高提取后端。前者仅依赖 CPU，效果一般，但速度最快。后者是业内的经典算法，速度较慢。在 CUDA 环境下，CREPE 后端会自动启用 GPU 加速。
 
+应用还新增了一个实验性的 **hybrid** 后端。该后端融合了 rmvpe-onnx 与 swift-f0 的预测结果，以 rmvpe-onnx 的音高提取结果为主，在音频有声段中，如果 rmvpe-onnx 的置信度较低且 swift-f0 的置信度较高，则采用 swift-f0 的结果进行修正，从而提升整体音高提取的准确性。
+
 > \* 在 Windows 平台下，TensorFlow 2.10 是最后一个支持 GPU 加速的版本，Python 3.10 是它的 `.whl` 文件支持的最高 Python 版本。
 
 ## 📌 使用场景
@@ -302,24 +312,40 @@ graph TB;
 #### 未来计划
 NiceGUI 框架已经开始着手改进文件拖拽支持，应该在未来的版本中能够解决此问题。
 
+---
+
 ### PITD 表情曲线整体变化过于平缓
 
 #### 问题现象
 提取出的 PITD 表情曲线过于平缓，整体上几乎没有大的起伏，参考人声中的音高变化并没有反映到表情曲线上。
 
 #### 可能原因
-PITD 表情提取器中，两个置信度阈值设置**过高**，许多音高变化没有被采信。
+1. 在早于 `v0.9.0` 的版本中，PITD 表情曲线取值与音高之间的换算有问题，会导致 PITD 表情曲线整体非常平缓。
+2. PITD 表情提取器中，两个置信度阈值设置**过高**，许多音高变化没有被采信。您可以在 [`expressive-viewer`](#可视化工具viewer) 中观察到，原始的音高曲线中有很多不该出现的缺失部分。
 
 #### 解决方案
-请先尝试使用效果最好的 rmvpe-onnx 后端（默认置信度阈值）。若问题仍在，尝试降低两个置信度阈值。一般来说，**歌姬音声**比较纯净，可以先调整**参考人声**的置信度阈值。
+1. 请下载安装 `v0.9.0` 及之后的版本。
+2. 请先尝试使用效果最好的 rmvpe-onnx 或 hybrid 后端（默认置信度阈值）。若问题仍在，尝试降低两个置信度阈值。您可以参考 [`expressive-viewer`](#可视化工具viewer) 的音高置信度曲线来辅助调整。一般来说，**歌姬音声**比较纯净，可以先调整**参考人声**的置信度阈值。
+
+#### 未来计划
+为 PITD 表情提取算法引入语义信息。
+
+---
 
 ### PITD 表情曲线在某些位置变化过快，出现跳跃或毛刺
 
 #### 问题现象
 PITD 表情曲线在某些位置变化过快，出现非常大的跳跃或毛刺，明显不符合人声的变化规律。
 
 #### 可能原因
-PITD 表情提取器中，两个置信度阈值设置**过低**，错误的识别结果被采信。
+1. 在早于 `v0.9.0` 的版本中，PITD 表情曲线取值与音高之间的换算有问题。
+2. 参考音频的对应时间戳附近有噪声。您可以在 [`expressive-viewer`](#可视化工具viewer) 中观察到，原始的音高曲线中有很多不该出现的尖刺。
+3. PITD 表情提取器中，两个置信度阈值设置**过低**，错误的识别结果被采信。您可以在 [`expressive-viewer`](#可视化工具viewer) 中观察到，原始的音高曲线中有很多不该出现的尖刺。
 
 #### 解决方案
-请先尝试使用效果最好的 rmvpe-onnx 后端（默认置信度阈值）。若问题仍在，尝试增加两个置信度阈值。一般来说，**歌姬音声**比较纯净，可以先调整**参考人声**的置信度阈值。
+1. 请下载安装 `v0.9.0` 及之后的版本。
+2. 可使用 [UVR](https://github.com/Anjok07/ultimatevocalremovergui) 、[MSST](https://github.com/SUC-DriverOld/MSST-WebUI) 等工具对参考音频去噪声（denoise）。
+3. 请先尝试使用效果最好的 rmvpe-onnx 或 hybrid 后端（默认置信度阈值）。若问题仍在，尝试增加两个置信度阈值。您可以参考 [`expressive-viewer`](#可视化工具viewer) 的音高置信度曲线来辅助调整。一般来说，**歌姬音声**比较纯净，可以先调整**参考人声**的置信度阈值。
+
+#### 未来计划
+为 PITD 表情提取算法引入语义信息。
@@ -18,21 +18,21 @@
         },
         "pitd": {
             "selected": true,
-            "backend": "rmvpe-onnx",
+            "backend": "hybrid",
             "confidence_utau": null,
             "confidence_ref": null,
             "align_radius": 1,
             "semitone_shift": 0,
-            "smoothness": 4,
-            "scaler": 2.2
+            "smoothness": 2,
+            "scaler": 1.0
         },
         "tenc": {
             "selected": true,
             "trim_silence": true,
             "align_radius": 1,
             "smoothness": 6,
             "scaler": 1.0,
-            "bias": 10
+            "bias": 15
         }
     }
 }
@@ -18,13 +18,13 @@
         },
         "pitd": {
             "selected": true,
-            "backend": "rmvpe-onnx",
+            "backend": "hybrid",
             "confidence_utau": null,
             "confidence_ref": null,
             "align_radius": 1,
             "semitone_shift": 0,
             "smoothness": 2,
-            "scaler": 2.0
+            "scaler": 1.0
         },
         "tenc": {
             "selected": true,
 
@@ -18,21 +18,21 @@
         },
         "pitd": {
             "selected": true,
-            "backend": "rmvpe-onnx",
+            "backend": "hybrid",
             "confidence_utau": null,
             "confidence_ref": null,
             "align_radius": 1,
             "semitone_shift": 0,
             "smoothness": 2,
-            "scaler": 2.0
+            "scaler": 1.0
         },
         "tenc": {
             "selected": true,
             "trim_silence": true,
             "align_radius": 1,
             "smoothness": 6,
             "scaler": 1.2,
-            "bias": 10
+            "bias": 15
         }
     }
 }