Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix Chinese and English mixing #237

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

chai51
Copy link

@chai51 chai51 commented Mar 13, 2025

The original code can support pure Chinese pronunciation very well, but when it comes to mixed Chinese and English pronunciation, the English pronunciation data will be lost.
I have improved this function. However, the download_model.py is missing because there is no download address for Kokoro-82M-v1.1-zh in the release.
#214

@fireblade2534
Copy link
Collaborator

@chai51 I'm not sure I understand what exactly this is trying to fix and how/what it fixes?

@RBEmerson970
Copy link

@fireblade2534

Purely a guess on my part, but I think this is a request for support for "Kokoro-82M-v1.1-zh" which is supposed to be better with at least Chinese, and which only handles Chinese and English.

@chai51
Copy link
Author

chai51 commented Mar 27, 2025

When the Voice is set to zf_xiaoyi and the Language is set to Chinese, it is illustrated by the following two use cases:
"该模型是经过短期训练的结果,从专业数据集中添加了 100 名中文使用者。" The synthesized pronunciation of this text is completely accurate.
"Kokoro 是一系列体积虽小但功能强大的 TTS 模型。" In this sentence, the pronunciations of "Kokoro" and "TTS" are incorrect.
Previously, the pronunciation of mixed Chinese and English texts would lose the English part of the pronunciation. I used the new Kokoro module to deal with English and Chinese separately and solve the problem of lost English pronunciation. Since the mixed use of Chinese and English scenes is very common, this improvement will enhance the diversity of Chinese scenes.

@fireblade2534
Copy link
Collaborator

fireblade2534 commented Mar 28, 2025

@chai51 So what your saying is that if the language is Chinese instead of loading the normal kokoro v1 model it loads kokoro v1.1. Some questions are:

  • What converts the chinese and English text to phenomes in such a way that both English pronunciation and Chinese pronunciation is maintained
  • Does it load v1.1 and v1 at the same time
  • Does it use the text normalization system for the English parts of text or is that skipped (Text normalization only works for English right now so it is automatically disabled if the lang code requests a different language)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants