XTTS_v2 model: How to reduce end-of-sentence hallucinations? How to achieve consistent results? #4146
Unanswered
Hexatona
asked this question in
General Q&A
Replies: 1 comment 1 reply
-
Minor breakthrough! You can experiment with this too. Repeatedly trying this sentence: "Let me up" Will result in a lot of hallucinations Sentences ending in an alpha character, or alpha and spaces tend to mess up the worst. "Let me up." or "Let me up " However, if you end the sentence with an underscore, it seems to smarten up. "Let me up_" I tried the above experiment with all other kinds of punctiation like ! or ? or !?, and the hallucinations don't seem to happen then. |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hey there. I've written up a nice program that turns properly tagged text documents into multi-chaptered audiobook files, and while I've gotten used to the phenomenon, I feel like the random end of sentence hallucinations that pop up from time to time are the last big hurdle before it's a real success.
I have found a few bugs in how text is parsed and spoken, and found workarounds - but the random hallucinations I can't seem to find a pattern for.
For example, if you send in a text like this: " , and why not, I should know, shouldn't I?" when generation starts with a comma, and there's another comma, the text between the 2nd and 3rd commas is completely ignored.
I'm sure I'm not the only one who has noticed the phenomena of the end of sentence hallucinations though - does anyone have any tips to get more consistent output?
And while we'r eon the subject of consistent output, every once in a while you'll find a word the model pronounces differently on random occasions. I've found one of the worst offenders would be "Cave" sometimes its sounds perfect, other times it sounds like "Cavey" or "Cavave". Is there any way to kind of affix the model to, say, speak the same line in exactly the same way everytime?
Beta Was this translation helpful? Give feedback.
All reactions