Infer_step selection

I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint. 

However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.

I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infer_step selection #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Infer_step selection #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions