feat: add audio step chat LM #122
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
feat:
PS:
语音文本对齐后的模型(LM),其文本语音能力可以支持这几种, 但是公开代码示例只有A1-T2 以及原本 T1->T2的方式, 其他能力以后有时间再去挖掘
⭐️ Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction | paper code
论文仅仅是技术报告,未有创新点(整体结构和glm-4-voice类似); 结合已有文本基座模型加入多模态数据训练,工程化落地
TTS 部分见:feat: add step1 audio tts #121
achatbot + glm-4-voice: feat: add daily_asr_glm_voice_bot daily_glm_voice_bot and deploy modal #95
cosyvoice解读:https://weedge.github.io/post/multimoding/voices/cosyvoice/
相对而言,论文中对比到模型, 其结构类似
deploy: