A Flask web application that converts video files into translated transcripts using AI-powered speech recognition and translation.
- 🎤 Extract audio from MP4 videos
- 🔉 Convert audio to 16kHz WAV format (optimal for speech recognition)
- 🗣️ Transcribe audio using Groq's Whisper model
- 🌍 Translate transcripts to multiple languages using Gemma3 AI
- 💾 Download transcripts as JSON files
- 🎨 Modern, responsive UI with progress tracking
Before you begin, ensure you have:
- Python 3.8+
- Ollama running locally with Gemma3 model
- Groq API key (for Whisper transcription)
- FFmpeg installed (for audio processing)
-
Clone the repository:
git clone https://github.com/yourusername/video-to-transcript.git cd video-to-transcript -
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # On Windows use `venv\Scripts\activate`
-
Install dependencies:
pip install -r requirements.txt
-
Set up grok api key:
set your grok api key in functions/convertedwav_to_transcript.py file -
Download FFmpeg: Open your terminal and enter this command:
sudo apt-get install ffmpeg
-
Download the gemma3 model: First install ollama in your system then open your terminal and enter this command::
ollama pull gemma3:12b
-
Start the Flask server:
python app.py
-
Access the application: Open your browser and navigate to:
http://localhost:5000
- Upload an MP4 video file
- Select target language for translation
- Click "Process Video"
- Wait for processing to complete
- Download the JSON transcript file
video-to-transcript/
├── app.py # Main Flask application
├── main.py # Core processing logic
├── functions/
│ ├── video_to_wav.py # Video to WAV conversion
│ ├── wav_to_16kwav.py # Audio format conversion
│ ├── convertedwav_to_transcript.py # Speech recognition
│ └── transcript_lan_covert.py # Translation
├── templates/
│ └── index.html # Frontend interface
├── static/
│ ├── script.js # Client-side JavaScript
│ └── style.css # Styling
└── outputs/ # Generated transcripts
The application supports translation to:
- Bengali (default)
- English
- Hindi
- Spanish
- French
- Portuguese
- German
- Russian
- Italian
- Dutch
- Chinese (Simplified)
- Japanese
- Korean
- Arabic
Contributions are welcome! Please follow these steps:
- Fork the repository
- Create a new branch (
git checkout -b feature-branch) - Commit your changes (
git commit -m 'Add new feature') - Push to the branch (
git push origin feature-branch) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Audio processing fails: Ensure FFmpeg is installed and in your PATH
- Translation errors: Verify Ollama is running and Gemma3 model is downloaded
- API errors: Check your Groq API key in the functions/convertedwav_to_transcript.py file
- File permission issues: Ensure the
uploadsandoutputsdirectories are writable
For support or questions, please contact sbose3739@gmail.com
