A dockerized deployment of NVIDIA's Canary-Qwen-2.5B speech recognition model for RunPod, featuring a Gradio interface for audio transcription and LLM-powered text analysis.
This container provides:
- ASR Mode: Speech-to-text transcription with punctuation and capitalization
- LLM Mode: Question-answering and analysis of transcribed text
- Gradio Interface: User-friendly web interface for audio upload and processing
- RunPod Optimized: Slim container with host-based model and dependency management
To deploy, use the following command, which mounts a local directory for caching:
# Use the built container image from Docker Hub
docker run -d \
--name canary-qwen \
--gpus all \
-p 7860:7860 \
-v /workspace/cache:/root/.cache:rw \
-e GRADIO_SHARE=true \
gemneye/canary-qwen-2.5b-runpod:latest/workspace/
βββ models/ # Model files (2.5B parameters)
β βββ canary-qwen-2.5b/
βββ cache/ # HuggingFace cache
βββ data/ # Temporary audio files
βββ logs/ # Application logs
βββ venv/ # Python virtual environment
βββ activate.sh # Environment activation script
βββ launch.sh # Application launch script
βββ env_vars.sh # Environment variables
- Format: WAV, FLAC, MP3, or other common audio formats
- Sample Rate: Automatically resampled to 16kHz
- Channels: Automatically converted to mono
- Duration: Optimal performance with <40 seconds
- Language: English only
- Drag and drop audio files
- Record directly using microphone
- Automatic transcription on upload
- High-accuracy speech-to-text
- Automatic punctuation and capitalization
- Real-time processing
- Ask questions about the transcribed content
- Summarization and analysis
- Content extraction and insights
"Summarize the main points discussed in the audio."
"What is the speaker's tone and emotion?"
"Extract any important dates, names, or numbers mentioned."
"What questions does the speaker ask?"
"Identify the key topics covered in this audio."
# Clone and build
git clone https://github.com/sruckh/canary-qwen-2.5b-RunPod.git
cd canary-qwen-2.5b-RunPod
docker build -t canary-qwen-2.5b .For development without GPU access:
# Edit docker-compose.yml to comment out GPU sections
# Then run
docker-compose up --buildEdit app.py to modify:
- UI layout and styling
- Processing parameters
- Model configuration
- Response formatting
- Parameters: 2.5 billion
- RTFx: 418 (Real-time factor)
- WER: 5.63% average on benchmarks
- Languages: English only
- GPU: NVIDIA GPU with CUDA support
- Memory: 8GB+ GPU memory recommended
- Storage: 10GB+ for model and cache
- Network: For model download and Gradio share
| Dataset | WER |
|---|---|
| AMI | 10.18% |
| GigaSpeech | 9.41% |
| LibriSpeech Clean | 1.60% |
| LibriSpeech Other | 3.10% |
| Earnings22 | 10.42% |
| Variable | Default | Description |
|---|---|---|
GRADIO_SHARE |
false |
Enable Gradio share links |
GRADIO_SERVER_NAME |
0.0.0.0 |
Server bind address |
GRADIO_SERVER_PORT |
7860 |
Server port |
MODEL_PATH |
/models/canary-qwen-2.5b |
Model directory |
HF_HOME |
/root/.cache |
HuggingFace cache |
CUDA_VISIBLE_DEVICES |
0 |
GPU selection |
The model supports two modes:
- ASR Mode: Direct speech-to-text transcription
- LLM Mode: Text analysis using the underlying language model
# Check model path
ls -la /models/canary-qwen-2.5b/
# Check GPU availability
nvidia-smi
# Check logs
docker logs canary-qwen- Ensure audio file is valid format
- Check file size (<100MB recommended)
- Verify audio duration (<40 seconds optimal)
- Reduce batch size in model configuration
- Ensure sufficient GPU memory (8GB+)
- Monitor memory usage with
nvidia-smi
- Use 16kHz mono audio for best performance
- Keep audio segments under 40 seconds
- Enable GPU acceleration
- Use SSD storage for model files
This project uses the NVIDIA Canary-Qwen-2.5B model under the CC-BY-4.0 license.
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
For issues and questions:
- Check the troubleshooting section
- Review application logs
- Verify RunPod environment configuration
- Ensure proper model installation
Note: This container is optimized for RunPod deployment with GPU acceleration. Local development without GPU is possible but will have limited functionality.