-
Notifications
You must be signed in to change notification settings - Fork 25
Description
I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.
However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.
I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?