Local Qwen3 VoiceDesign TTS service built with FastAPI.
- Working end-to-end via
curl - Working end-to-end from a separate Swift CLI client
- Current output format:
audio/wav(PCM16 mono 24kHz)
uv run python main.pyIn another terminal:
mkdir -p outputs
curl -X POST http://127.0.0.1:8000/model/load
curl -X POST http://127.0.0.1:8000/synthesize \
-H "Content-Type: application/json" \
-d '{"text":"Hello from TalkToMePy demo.","instruct":"Warm and clear narrator voice.","language":"English","format":"wav"}' \
--output outputs/demo.wav
afplay outputs/demo.wav- Python
>=3.13 uvsoxon PATH (macOS:brew install sox)
./scripts/setup.shThis script:
- checks for
uvandsox - runs
uv sync - creates
outputs/ - creates
.env.launchdfrom.env.exampleif missing
Copy and edit:
cp .env.example .env.launchdscripts/run_service.sh will load .env.launchd when running under launchd.
uv run python main.pyService URL: http://127.0.0.1:8000
FastAPI exposes live docs/spec automatically:
- OpenAPI JSON:
http://127.0.0.1:8000/openapi.json - Swagger UI:
http://127.0.0.1:8000/docs - ReDoc:
http://127.0.0.1:8000/redoc
This repo also includes a committed YAML spec:
/Users/galew/Workspace/projects/talkToMePy/openapi/openapi.yaml
Regenerate it after API changes:
uv run python scripts/export_openapi.pyThis repo includes:
- LaunchAgent template:
launchd/com.talktomepy.plist - Runner script:
scripts/run_service.sh
Install and start (user agent):
REPO_DIR="$(pwd)"
mkdir -p ~/Library/LaunchAgents
cp launchd/com.talktomepy.plist ~/Library/LaunchAgents/com.talktomepy.plist
sed -i '' "s|__REPO_DIR__|$REPO_DIR|g; s|__HOME__|$HOME|g" ~/Library/LaunchAgents/com.talktomepy.plist
launchctl bootstrap gui/$(id -u) ~/Library/LaunchAgents/com.talktomepy.plist
launchctl kickstart -k gui/$(id -u)/com.talktomepyStatus and logs:
launchctl print gui/$(id -u)/com.talktomepy
tail -f ~/Library/Logs/talktomepy.stdout.log ~/Library/Logs/talktomepy.stderr.logStop and remove:
launchctl bootout gui/$(id -u) ~/Library/LaunchAgents/com.talktomepy.plist
rm ~/Library/LaunchAgents/com.talktomepy.plistNotes for modern macOS (including macOS 26):
- Prefer
bootstrap/bootout/kickstartover legacyload/unload. launchdhas a minimal environment; keep required env vars inscripts/run_service.sh.scripts/run_service.shsets a Homebrew-friendly defaultPATHsosoxis resolvable under launchd.
GET /healthreturns service statusGET /versionreturns API/service version metadataGET /adapterslists available runtime adaptersGET /adapters/{adapter_id}/statusreturns adapter-specific statusGET /model/statusreturns model runtime readiness (SoX, qwen-tts, load state)POST /model/loadlazily loads the configured model into memoryPOST /model/unloadunloads the model from memoryPOST /synthesizereturns generated audio bytes asaudio/wavPOST /synthesize/streamstreams generated audio bytes asaudio/wav
Notes:
POST /model/loadmay return202 Acceptedwhile loading is in progress.POST /synthesizereturns503withRetry-Afterif model is still loading.
curl http://127.0.0.1:8000/healthcurl http://127.0.0.1:8000/versioncurl http://127.0.0.1:8000/adapterscurl http://127.0.0.1:8000/adapters/qwen3-tts/statuscurl http://127.0.0.1:8000/model/statuscurl -X POST http://127.0.0.1:8000/model/loadcurl -X POST http://127.0.0.1:8000/model/unloadcurl -X POST http://127.0.0.1:8000/synthesize \
-H "Content-Type: application/json" \
-d '{"text":"Hello from Swift bridge!","instruct":"Warm and friendly voice with steady pace.","language":"English","format":"wav"}' \
--output outputs/from_service.wavcurl -N -X POST http://127.0.0.1:8000/synthesize/stream \
-H "Content-Type: application/json" \
-d '{"text":"Streaming endpoint test.","instruct":"Warm and friendly voice with steady pace.","language":"English","format":"wav"}' \
--output outputs/from_stream.wavPlay the generated file on macOS:
afplay outputs/from_service.wavuv run python scripts/voice_design_smoke.py \
--text "Hello from my Swift CLI bridge." \
--instruct "Energetic, friendly, and slightly brisk pacing with bright tone." \
--output outputs/swift_bridge_demo.wavqwen-ttscurrently requirestransformers==4.57.3(pinned in this repo)./synthesizecurrently supportsformat: "wav"only.- Model id can be overridden with env var
QWEN_TTS_MODEL_ID. - Optional idle auto-unload can be enabled with env var
QWEN_TTS_IDLE_UNLOAD_SECONDS. - Optional startup warm-load can be enabled with env var
QWEN_TTS_WARM_LOAD_ON_START=true. - Optional load settings:
QWEN_TTS_DEVICE_MAP,QWEN_TTS_TORCH_DTYPE.
- Add optional on-disk audio caching
- Add structured request/response logging and timing metrics
- Add Docker setup for self-hosting on a local machine (for example Mac mini)
- Add small auth layer for non-local deployments
- Add unit tests for
/model/load,/synthesize, and/synthesize/streamerror paths - Add integration test that writes and validates returned WAV header
- Add graceful startup warm-load option (env-controlled)
- Add response metadata headers for generation latency
- Add
GET /adapters/{id}/voicesfor discoverable voice/speaker options - Add generalized
POST /adapters/{id}/loadandPOST /adapters/{id}/unloadendpoints - Add async synthesis job APIs:
POST /synthesize/jobsGET /synthesize/jobs/{job_id}GET /synthesize/jobs/{job_id}/audio
- Add example Swift client snippet directly in this repo