# AI Speech Translation App
**Realtime Open-Source Voice Translator for the Web**
> Record speech, transcribe, translate, and play back in any supported language using open-source models and free cloud platforms.
---
## ๐ Features
- **Multilingual Speech-to-Text:** Fast, accurate transcription (Faster-Whisper).
- **Real-Time Translation:** Supports any translation models (MarianMT, NLLB, etc).
- **AI Text-to-Speech:** Natural speech synthesis in dozens of languages (Piper TTS).
- **Modular Microservices:** Frontend + ASR + Translate + TTS, easy to deploy.
- **Free Cloud Ready:** Deploy to Vercel, Render, Railway with zero cost and zero vendor lock-in.
- **100% Open-Source:** No paid APIs, privacy by design.
---
## ๐ฆ Tech Stack
- **Frontend:** Next.js (TypeScript, TailwindCSS)
- **ASR:** Python FastAPI + Faster-Whisper
- **Translation:** Python FastAPI + HuggingFace Transformers (e.g., MarianMT)
- **TTS:** Python FastAPI + Piper
- **Containerization:** Docker (per service)
- **Cloud Hosting:** Vercel (Web), Render/Railway (Services)
---
## ๐ Quickstart
### Prerequisites
- Node.js (v18+)
- Python (3.8+ recommended)
- Docker (optional, for cloud deploy)
- [Piper voice models](https://github.com/rhasspy/piper-voices)
- [Huggingface model weights](https://huggingface.co/models)
### Clone & Structure
git clone https://github.com//ai-speech-translation-app.git cd ai-speech-translation-app
ai-speech-translation-app/ apps/web/ services/asr/ services/translate/ services/tts/
### Setup Frontend
cd apps/web npx create-next-app@latest . npm install npm run dev
### Setup ASR Service
cd ../../services/asr pip install -r requirements.txt uvicorn main:app --reload --port 8000
### Setup Translation Service
cd ../translate pip install -r requirements.txt uvicorn main:app --reload --port 8001
### Setup TTS Service
- Download Piper model & config and save in `services/tts`
- Update `main.py` with correct paths
cd ../tts pip install -r requirements.txt uvicorn main:app --reload --port 8002
---
## ๐ก Example Usage
To synthesize Hindi speech from text via TTS:
curl -X POST -F "text=เคจเคฎเคธเฅเคคเฅ, เคเคช เคเฅเคธเฅ เคนเฅเค?" http://localhost:8002/speak --output hindi_output.wav
To translate text:
curl -X POST -F "text=Hello" http://localhost:8001/translate
To transcribe audio:
curl -X POST -F "[email protected]" http://localhost:8000/transcribe
---
## ๐ Deployment
- **Frontend:** Deploy to Vercel (GitHub integration, env vars for service URLs)
- **APIs:** Deploy ASR, Translate, and TTS to Render/Railway (one-click Docker deploy, free tier)
- **Environment Variables:** Set API endpoints in Vercel project settings
---
## ๐ ๏ธ Customization
- **Languages:** Change translation model in `translate/main.py`
- **Voices:** Download new Piper voices/configs for TTS
- **UI:** Edit `apps/web/src/app/page.tsx` for more features or language support
---
## ๐ Contributing
1. Fork the repo, clone, and make changes in a feature branch
2. Update documentation and verify testing
3. Create a PR describing your improvement or fix
---
## โก Troubleshooting
- **CORS errors:** Add FastAPI CORS middleware per service.
- **Model errors:** Verify file paths and model/config version match.
- **No audio/chunks:** Use correct Piper voice and config. Test with simple text.
- **Cloud limits:** Use base/tiny models for lowest latency on free hosting.
---
## ๐ License
Distributed under the MIT License.
See [LICENSE](LICENSE) for details.
---
## ๐ Acknowledgements
- [OpenAI Whisper](https://github.com/openai/whisper), [CTranslate2](https://github.com/OpenNMT/CTranslate2)
- [Piper TTS](https://github.com/rhasspy/piper)
- [Huggingface Transformers](https://github.com/huggingface/transformers)