The result of Tacotron 2 is only 12 seconds when inferenced #1325

brianadit24 · 2022-03-02T08:59:44Z

brianadit24
Mar 2, 2022

I tried inference on the Tacotron 2 model with Griffin Lim and WaveGrad but it only produces 12 seconds of audio for all text input lengths.

For example, I do a short text input and then it produces a sound of duration 12 second. This is also the same as I do input with long text by producing a sound of 12 seconds but the spoken text input is not fully spoken.

How do I get flexible audio output that depends on text input?

tekinek · 2022-03-03T17:05:07Z

tekinek
Mar 3, 2022

Tacotron has a param named max_decoder_steps which terminates the synthesis when the number of steps of decoder prediction reachs that limit. If this is the case, you would see a message like "Decoder stopped with maximum decoder steps". You can change this value by model_instance.max_decoder_steps = 5000 before the inference call. It is better to be 3*length of your sentences.

If you see this message even for a short sentences (a few words for exmaple), then your model isn't well trained. If there is no such message on your console right after the synthesis completes, then it might be another problem.

1 reply

brianadit24 Mar 7, 2022
Author

Yes, i still see message "Decoder stopped with max_decoder_steps 1000" even in a short sentences. it also results in inflexible audio timing. But change the max_decoder_step can handle long sentences though still not flexible for inference.

Maybe any suggestions I can do to get a flexible audio timing output?

oronkam12 · 2025-02-14T10:01:51Z

oronkam12
Feb 14, 2025

its just that the model isnt trained enough there is also a stop decoding conf param in the hparams.py thats you can lower but most of the time its just because the model was not trained enough.
do not even try to generate before reach a loss of 0.3 for my usecase its wasnt good untill then.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The result of Tacotron 2 is only 12 seconds when inferenced #1325

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 1 reply

{{title}}

{{title}}

{{title}}

Select a reply

The result of Tacotron 2 is only 12 seconds when inferenced #1325

brianadit24 Mar 2, 2022

Replies: 2 comments · 1 reply

tekinek Mar 3, 2022

brianadit24 Mar 7, 2022 Author

oronkam12 Feb 14, 2025

brianadit24
Mar 2, 2022

Replies: 2 comments 1 reply

tekinek
Mar 3, 2022

brianadit24 Mar 7, 2022
Author

oronkam12
Feb 14, 2025