FastSparkTTS Speech Synthesis and Cloning Platform 🔊

中文 | English

🚀 FastSparkTTS – Based on the SparkTTS model, this platform provides high-quality Chinese speech synthesis and voice cloning services. With an easy-to-use web interface, you can effortlessly create natural and realistic human voices to suit various scenarios.

✨ Features

🚀 Multiple Backend Acceleration Options: Supports acceleration strategies such as vllm, sglang, and llama cpp
🎯 High Concurrency: Utilizes dynamic batching to significantly boost concurrent processing
🎛️ Full Parameter Control: Offers comprehensive adjustments for pitch, speech rate, voice timbre temperature, and more
📱 Lightweight Deployment: Minimal dependencies, with rapid startup based on Flask and fastapi
🎨 Clean Interface: Features a modern, standardized UI
🔊 Long Text Speech Synthesis: Capable of synthesizing extended texts while maintaining consistent voice timbre

🖼️ Interface Preview

Speech Synthesis

Voice Cloning

Character Cloning

🛠️ Quick Start

Requirements

Python 3.10+
Flask 2.0+
fastapi
vllm or sglang or llama-cpp

Installing Dependencies

pip install -r requirements.txt

Inference Engine Installation

(Install one as needed; if using torch for inference, you can skip this step)

vLLM

The vllm version should be greater than 0.7.2
```
pip install vllm
```
For more details, please refer to: https://github.com/vllm-project/vllm
llama-cpp
```
pip install llama-cpp-python
```
Convert the LLM weights to gguf format, save the file as model.gguf, and place it in the LLM directory. You can refer to the following method for weight conversion. If quantization is needed, you can configure the parameters accordingly.
```
git clone https://github.com/ggml-org/llama.cpp.git

cd llama.cpp

python convert_hf_to_gguf.py Spark-TTS-0.5B/LLM --outfile Spark-TTS-0.5B/LLM/model.gguf
```
sglang
```
pip install sglang
```
For more details, please refer to: https://github.com/sgl-project/sglang

Downloading Weights

Weight download links: huggingface, modelscope

Start

Clone the project repository

git clone https://github.com/HuiResearch/Fast-Spark-TTS.git
cd Fast-Spark-TTS

Start the SparkTTS API Service

The engine can be chosen according to your environment; currently supported options include torch, vllm, sglang, and llama-cpp.

python server.py \
--model_path Spark-TTS-0.5B \
--engine vllm \
--llm_device cuda \
--tokenizer_device cuda \
--detokenizer_device cuda \
--wav2vec_attn_implementation sdpa \
--max_length 32768 \
--llm_gpu_memory_utilization 0.6 \
--host 0.0.0.0 \
--port 8000

Start the Web Interface
```
python frontend.py
```
Access via your browser
```
http://localhost:8001
```

🚀 User Guide

Speech Synthesis

Switch to the Speech Synthesis tab.
Enter the text you wish to convert to speech.
Adjust parameters such as gender, pitch, and speech rate.
Click the Generate Speech button.
Once generation is complete, play or download the audio.

Voice Cloning

Switch to the Voice Cloning tab.
Enter the target text.
Upload the reference audio.
Enter the corresponding text for the reference audio.
Adjust the parameters.
Click the Clone Voice button.
Once cloning is complete, play or download the audio.

Character Cloning

Switch to the Character Cloning tab.
Enter the target text.
Choose your desired character.
Adjust the parameters.
Click the Character Cloning button.
Once cloning is complete, play or download the audio.

Inference Speed

Graphics Card: A800

Using prompt_audio.wav to test cloning speed, the inference is looped five times to calculate the average inference time (in seconds).

Test code reference: speed_test.py

After using vllm, most of the processing time is spent on the audio tokenizer and vocoder rather than the LLM. Optimization using ONNX might further improve performance.

engine	device	Avg Time	Avg Time (warm up)
Official	CPU	27.20	27.30
Official	GPU	5.95	4.97
llama-cpp	CPU	11.32	11.09
vllm	GPU	1.95	1.22
sglang	GPU	3.41	0.76

Local Usage

Usage instructions can be found in [inference.py].

For API deployment and repeated inference calls, it is recommended to use asynchronous (async) methods.

Note: For backends like vllm and sglang, the first inference call might take longer, but subsequent calls will perform normally. For benchmarking, it is advised to warm up using the first data entry.

Reference

Spark-TTS

⚠️ Disclaimer

This project provides a zero-shot voice cloning TTS model intended for academic research, educational purposes, and lawful applications such as personalized speech synthesis, assistive technologies, and linguistic studies.

Please note:

Do not use this model for unauthorized voice cloning, impersonation, fraud, scams, deepfakes, or any illegal activities.
Ensure compliance with local laws, regulations, and ethical standards when using this model.
The developers assume no responsibility for any misuse of this model.

This project advocates the responsible development and use of artificial intelligence and encourages the community to adhere to safety and ethical principles in AI research and applications.

License and Acknowledgments

This project is built upon Spark-TTS and is distributed under the same open-source license as SparkTTS. For details, please refer to the original SparkTTS License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README_EN.MD

README_EN.MD

FastSparkTTS Speech Synthesis and Cloning Platform 🔊

✨ Features

🖼️ Interface Preview

🛠️ Quick Start

Requirements

Installing Dependencies

Inference Engine Installation

Downloading Weights

Start

🚀 User Guide

Speech Synthesis

Voice Cloning

Character Cloning

Inference Speed

Local Usage

Reference

⚠️ Disclaimer

License and Acknowledgments

Files

README_EN.MD

Latest commit

History

README_EN.MD

File metadata and controls

FastSparkTTS Speech Synthesis and Cloning Platform 🔊

✨ Features

🖼️ Interface Preview

🛠️ Quick Start

Requirements

Installing Dependencies

Inference Engine Installation

Downloading Weights

Start

🚀 User Guide

Speech Synthesis

Voice Cloning

Character Cloning

Inference Speed

Local Usage

Reference

⚠️ Disclaimer

License and Acknowledgments