Transform video and audio recordings into structured, actionable documentation. Supports multiple transcription providers (Deepgram, ElevenLabs, Whisper, Parakeet) with speaker diarization, topic detection, and sentiment analysis.
- Multi-Provider Support — Cloud (Deepgram, ElevenLabs) and local (Whisper, Parakeet) transcription
- URL Ingestion — Direct processing from YouTube, Vimeo, and other platforms
- Interactive TUI — Terminal interface with live progress and health monitoring
- Intelligent Analysis — Speaker separation, topic extraction, sentiment analysis
- Production Ready — Circuit breaker pattern, health checks, path sanitization
- Python 3.11+ (3.12 recommended)
- FFmpeg
- API key for cloud providers OR local models (Whisper/Parakeet)
# Clone and install
git clone https://github.com/edlsh/audio-extraction-analysis.git
cd audio-extraction-analysis
uv sync
# Install FFmpeg
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu
choco install ffmpeg # Windowsuv sync --extra tui # Terminal UI
uv add openai-whisper torch # Local Whisper
uv sync --extra parakeet # NVIDIA Parakeet
uv sync --dev --extra tui --extra parakeet # All features# Cloud: Set API key
export DEEPGRAM_API_KEY='your-key' # From console.deepgram.com
# Or local: Whisper works without API keys after installationaudio-extraction-analysis --version# Process a video file
audio-extraction-analysis process meeting.mp4
# Process from URL
audio-extraction-analysis process --url "https://youtube.com/watch?v=..."
# With custom output
audio-extraction-analysis process video.mp4 --output-dir ./results
# Full analysis (5 files)
audio-extraction-analysis process video.mp4 --analysis-style full| Command | Description |
|---|---|
process |
Full pipeline: extract → transcribe → analyze |
extract |
Audio extraction only |
transcribe |
Transcription only |
export-markdown |
Export transcript as Markdown |
tui |
Launch interactive terminal UI |
| Option | Values | Description |
|---|---|---|
--quality |
speech, standard, high, compressed |
Audio quality preset |
--language |
en, es, fr, de, auto |
Transcription language |
--provider |
auto, deepgram, elevenlabs, whisper, parakeet |
Provider selection |
--output-dir |
Path | Output directory |
--analysis-style |
concise, full |
Single file vs 5-file output |
--verbose |
Flag | Detailed logging |
# High-quality transcription
audio-extraction-analysis process interview.mp4 --quality high --language en
# Extract audio only
audio-extraction-analysis extract presentation.mp4 --quality speech
# Transcribe existing audio
audio-extraction-analysis transcribe recording.mp3 --provider whisper
# Batch processing
for video in *.mp4; do
audio-extraction-analysis process "$video" --output-dir "./results/${video%.*}"
doneLaunch a guided interface with real-time progress monitoring:
audio-extraction-analysis tui
audio-extraction-analysis tui --input video.mp4 --output-dir ./results- Live progress bars with ETAs
- Color-coded log streaming
- Provider health monitoring
- File browser with recent files
- Dark/light theme toggle
- Auto-saved configuration
| Key | Action |
|---|---|
q |
Quit |
d |
Toggle dark mode |
? / h |
Help |
c |
Cancel pipeline (run screen) |
o |
Open output folder (when complete) |
./output/
├── meeting.mp3 # Extracted audio
├── meeting_analysis.md # Single comprehensive analysis
└── meeting_transcript.txt # Provider-formatted transcript
./output/
├── meeting.mp3
├── 01_executive_summary.md # High-level overview
├── 02_chapter_overview.md # Content breakdown by topic
├── 03_key_topics_and_intents.md # Technical analysis
├── 04_full_transcript_with_timestamps.md
└── 05_key_insights_and_takeaways.md
# Cloud providers
export DEEPGRAM_API_KEY='...' # console.deepgram.com
export ELEVENLABS_API_KEY='...' # elevenlabs.io/api
# Local providers (optional)
export WHISPER_MODEL='base' # tiny, base, small, medium, large
export WHISPER_DEVICE='cuda' # cuda or cpu
# General
export LOG_LEVEL='INFO' # DEBUG, INFO, WARNING, ERRORFor detailed provider configuration, see docs/PROVIDERS.md.
en, es, fr, de, it, pt, auto (auto-detect)
Whisper supports 100+ languages.
| Issue | Solution |
|---|---|
| Input file not found | Use absolute path: /full/path/to/video.mp4 |
| API key not configured | export DEEPGRAM_API_KEY='...' or create .env |
| FFmpeg not found | Install: brew install ffmpeg (macOS) |
| TUI not working | uv sync --extra tui |
For detailed troubleshooting, see docs/TROUBLESHOOTING.md.
| Scenario | Input | Output | Time |
|---|---|---|---|
| Business meetings | 2-hour recording | Executive summary, action items | ~5-7 min |
| Training sessions | Multi-hour video | Searchable reference, key concepts | ~10-15 min |
| Customer interviews | Interview recordings | Insights, pain points, feature requests | ~3-5 min |
| Podcasts/Webinars | Long-form content | Chapter breakdown, topics, quotes | ~5-10 min |
- Accuracy: 95%+ (Deepgram Nova 3), 85%+ (Whisper large)
- Speed: Real-time (cloud), 0.5-5x real-time (local)
- Languages: 10+ (cloud), 100+ (Whisper)
- Provider Configuration — Whisper, Parakeet, cloud provider setup
- Templates Guide — Customize Markdown output
- Troubleshooting — Common issues and solutions
- HTML Dashboard — Interactive dashboard rendering
- Examples — Sample outputs and scripts
# Development setup
git clone https://github.com/edlsh/audio-extraction-analysis.git
cd audio-extraction-analysis
uv sync --dev --extra tui --extra parakeet
# Run tests
pytest # Unit tests
./scripts/run_tests.sh --profile all # Full suite
# Code quality
black src/ tests/ && ruff check src/ # Format + lintThis project is provided as-is for professional use.
Transform recordings into structured, actionable documentation.