-
Notifications
You must be signed in to change notification settings - Fork 14
feat: Assembly ai plugin #103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Changes from 4 commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
2765cdc
feat: Add AssemblyAI STT plugin and example
d3xvn 3bcdf1b
Fix AssemblyAI plugin tests with comprehensive mocking
d3xvn 0db9a6f
Refactor AssemblyAI STT tests: comprehensive improvements
d3xvn f02c80e
Update AssemblyAI example: cleanup and documentation
d3xvn 85e4d33
Optimize AssemblyAI tests: eliminate repeated imports
d3xvn 8cd26e4
Merge branch 'webrtc' into assembly-ai-plugin
d3xvn 2e25c93
Merge branch 'webrtc' into assembly-ai-plugin
d3xvn 414703f
fix example user metadata
d3xvn dbd5022
fixed aai tests
d3xvn 310bfa5
Merge branch 'webrtc' into assembly-ai-plugin
d3xvn dca020b
Merge branch 'webrtc' into assembly-ai-plugin
d3xvn 7c053a8
fix: apply ruff formatting and import sorting across project
d3xvn 0af2158
merge: integrate remote changes with local ruff formatting
d3xvn 28b3894
Revert "merge: integrate remote changes with local ruff formatting"
d3xvn File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
164 changes: 164 additions & 0 deletions
164
examples/stt_assemblyai_transcription/EXAMPLE_SUMMARY.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,164 @@ | ||
| # AssemblyAI STT Example Summary | ||
|
|
||
| ## Overview | ||
|
|
||
| This example demonstrates real-time speech-to-text transcription in Stream video calls using the AssemblyAI plugin. It's designed to be a drop-in replacement for the Deepgram example, showing how easy it is to switch between different STT providers in the GetStream ecosystem. | ||
|
|
||
| ## What This Example Provides | ||
|
|
||
| ### 🎯 **Core Functionality** | ||
| - **Real-time transcription bot** that joins Stream video calls | ||
| - **Live audio processing** with AssemblyAI's streaming API | ||
| - **Browser interface** for users to join calls | ||
| - **Terminal output** showing transcripts with timestamps | ||
|
|
||
| ### 🔧 **Technical Features** | ||
| - **AssemblyAI integration** using the custom plugin | ||
| - **WebRTC audio capture** from Stream calls | ||
| - **Event-driven architecture** for real-time processing | ||
| - **Error handling** and graceful cleanup | ||
| - **User management** with automatic cleanup | ||
|
|
||
| ### 📊 **Transcription Features** | ||
| - **Partial transcripts** for immediate feedback | ||
| - **Final transcripts** with confidence scores | ||
| - **Automatic punctuation** for readability | ||
| - **Utterance detection** for natural speech segmentation | ||
| - **Multi-language support** (configurable) | ||
|
|
||
| ## Comparison with Deepgram Example | ||
|
|
||
| | Feature | Deepgram Example | AssemblyAI Example | | ||
| |---------|------------------|-------------------| | ||
| | **STT Provider** | Deepgram | AssemblyAI | | ||
| | **API Integration** | Deepgram SDK | AssemblyAI SDK | | ||
| | **Audio Format** | PCM 16-bit | PCM 16-bit | | ||
| | **Sample Rate** | Configurable | Configurable (default: 48kHz) | | ||
| | **Language Support** | Multi-language | Multi-language | | ||
| | **Real-time** | ✅ Yes | ✅ Yes | | ||
| | **Partial Results** | ✅ Yes | ✅ Yes | | ||
| | **Confidence Scores** | ✅ Yes | ✅ Yes | | ||
| | **Error Handling** | ✅ Yes | ✅ Yes | | ||
|
|
||
| ## Architecture | ||
|
|
||
| ``` | ||
| ┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐ | ||
| │ Stream Call │───▶│ AssemblyAI STT │───▶│ Terminal │ | ||
| │ │ │ Plugin │ │ Output │ | ||
| └─────────────────┘ └──────────────────┘ └─────────────────┘ | ||
| │ │ | ||
| ▼ ▼ | ||
| ┌─────────────────┐ ┌──────────────────┐ | ||
| │ Browser UI │ │ Audio Stream │ | ||
| │ (User Join) │ │ Processing │ | ||
| └─────────────────┘ └──────────────────┘ | ||
| ``` | ||
|
|
||
| ## Key Components | ||
|
|
||
| ### 1. **Main Application (`main.py`)** | ||
| - Call creation and management | ||
| - User authentication and tokens | ||
| - Browser interface setup | ||
| - Event handler registration | ||
|
|
||
| ### 2. **AssemblyAI STT Plugin** | ||
| - Real-time audio processing | ||
| - Streaming API integration | ||
| - Event emission for transcripts | ||
| - Error handling and recovery | ||
|
|
||
| ### 3. **Stream Integration** | ||
| - WebRTC connection management | ||
| - Audio track capture | ||
| - Participant management | ||
| - Call lifecycle handling | ||
|
|
||
| ## Usage Scenarios | ||
|
|
||
| ### 🎙️ **Live Meeting Transcription** | ||
| - Real-time captions during video calls | ||
| - Meeting minutes generation | ||
| - Accessibility support for hearing-impaired users | ||
|
|
||
| ### 📝 **Content Creation** | ||
| - Podcast transcription | ||
| - Interview recording | ||
| - Educational content processing | ||
|
|
||
| ### 🔍 **Quality Assurance** | ||
| - Call center monitoring | ||
| - Training session review | ||
| - Compliance documentation | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| The example is highly configurable through the `AssemblyAISTT` constructor: | ||
|
|
||
| ```python | ||
| stt = AssemblyAISTT( | ||
| sample_rate=48000, # Audio quality | ||
| language="en", # Language selection | ||
| interim_results=True, # Real-time feedback | ||
| enable_partials=True, # Partial transcripts | ||
| enable_automatic_punctuation=True, # Auto-punctuation | ||
| enable_utterance_end_detection=True, # Speech segmentation | ||
| ) | ||
| ``` | ||
|
|
||
| ## Performance Characteristics | ||
|
|
||
| - **Latency**: Low-latency real-time processing | ||
| - **Accuracy**: High-quality transcription with confidence scoring | ||
| - **Scalability**: Handles multiple participants simultaneously | ||
| - **Reliability**: Automatic error recovery and connection management | ||
|
|
||
| ## Extensibility | ||
|
|
||
| This example serves as a foundation for building more complex applications: | ||
|
|
||
| - **Multi-language support** for international teams | ||
| - **Custom vocabulary** for domain-specific terms | ||
| - **Speaker identification** for multi-participant calls | ||
| - **Analytics integration** for usage metrics | ||
| - **Webhook integration** for external systems | ||
|
|
||
| ## Getting Started | ||
|
|
||
| 1. **Install dependencies**: `uv sync` | ||
| 2. **Configure environment**: Copy `env.example` to `.env` | ||
| 3. **Add API keys**: Stream and AssemblyAI credentials | ||
| 4. **Run the example**: `uv run main.py` | ||
| 5. **Join the call**: Browser will open automatically | ||
| 6. **Start speaking**: Watch real-time transcription | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| ### Common Issues | ||
| - **API key errors**: Verify AssemblyAI credentials | ||
| - **Audio not detected**: Check microphone permissions | ||
| - **Connection failures**: Verify internet and Stream credentials | ||
| - **Import errors**: Ensure all dependencies are installed | ||
|
|
||
| ### Debug Mode | ||
| Enable verbose logging by modifying the logging level in `main.py`: | ||
| ```python | ||
| logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(levelname)s %(message)s") | ||
| ``` | ||
|
|
||
| ## Next Steps | ||
|
|
||
| After running this example successfully: | ||
|
|
||
| 1. **Customize the configuration** for your use case | ||
| 2. **Integrate with your application** using the plugin directly | ||
| 3. **Explore advanced features** like custom models and vocabulary | ||
| 4. **Build production applications** with proper error handling and monitoring | ||
|
|
||
| ## Support Resources | ||
|
|
||
| - **AssemblyAI Documentation**: https://www.assemblyai.com/docs | ||
| - **GetStream Documentation**: https://getstream.io/docs | ||
| - **Plugin Source**: `getstream/plugins/assemblyai/` | ||
| - **Example Source**: `examples/stt_assemblyai_transcription/` | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,89 @@ | ||
| # Stream + AssemblyAI STT Example | ||
|
|
||
| This example demonstrates how to build a real-time transcription bot that joins a Stream video call and transcribes speech using AssemblyAI's Speech-to-Text API. | ||
|
|
||
| ## What it does | ||
|
|
||
| - 🤖 Creates a transcription bot that joins a Stream video call | ||
| - 🌐 Opens a browser interface for users to join the call | ||
| - 🎙️ Transcribes speech in real-time using AssemblyAI STT | ||
| - 📝 Displays transcriptions with timestamps and confidence scores in the terminal | ||
|
|
||
| ## Prerequisites | ||
|
|
||
| 1. **Stream Account**: Get your API credentials from [Stream Dashboard](https://dashboard.getstream.io) | ||
| 2. **AssemblyAI Account**: Get your API key from [AssemblyAI Console](https://www.assemblyai.com/) | ||
| 3. **Python 3.10+**: Required for running the example | ||
|
|
||
| ## Installation | ||
|
|
||
| You can use your preferred package manager, but we recommend [`uv`](https://docs.astral.sh/uv/). | ||
|
|
||
| 1. **Navigate to this directory:** | ||
| ```bash | ||
| cd examples/stt_assemblyai_transcription | ||
| ``` | ||
|
|
||
| 2. **Install dependencies:** | ||
| ```bash | ||
| uv sync | ||
| ``` | ||
|
|
||
| 3. **Set up environment variables:** | ||
| Rename `env.example` to `.env` and fill in your actual credentials. | ||
|
|
||
| ## Usage | ||
|
|
||
| Run the example: | ||
| ```bash | ||
| uv run main.py | ||
| ``` | ||
|
|
||
| ## Configuration Options | ||
|
|
||
| You can customize the AssemblyAI STT settings in the `main.py` file: | ||
|
|
||
| ```python | ||
| stt = AssemblyAISTT( | ||
| sample_rate=48000, # Audio sample rate | ||
| language="en", # Language code | ||
| interim_results=True, # Enable interim results | ||
| enable_partials=True, # Enable partial transcripts | ||
| enable_automatic_punctuation=True, # Auto-punctuation | ||
| enable_utterance_end_detection=True, # Utterance detection | ||
| ) | ||
| ``` | ||
|
|
||
| ## Features | ||
|
|
||
| - **Real-time transcription** with low latency | ||
| - **Partial transcripts** for immediate feedback | ||
| - **Automatic punctuation** for better readability | ||
| - **Utterance end detection** for natural speech segmentation | ||
| - **Multi-language support** (change the `language` parameter) | ||
| - **Confidence scoring** for transcription quality | ||
|
|
||
| ## How it works | ||
|
|
||
| 1. **Call Setup**: Creates a Stream video call with unique IDs | ||
| 2. **Bot Joins**: A transcription bot joins the call as a participant | ||
| 3. **Audio Processing**: Captures audio from all participants | ||
| 4. **Real-time Transcription**: Sends audio to AssemblyAI for processing | ||
| 5. **Results Display**: Shows transcripts in the terminal with timestamps | ||
|
|
||
| ## Troubleshooting | ||
|
|
||
| - **No audio detected**: Ensure your microphone is working and permissions are granted | ||
| - **API errors**: Check your AssemblyAI API key and account status | ||
| - **Connection issues**: Verify your internet connection and Stream credentials | ||
|
|
||
| ## AssemblyAI Features | ||
|
|
||
| AssemblyAI provides high-quality transcription with: | ||
| - **Nova-2 model** for best accuracy | ||
| - **Real-time streaming** for low latency | ||
| - **Automatic language detection** support | ||
| - **Speaker diarization** capabilities | ||
| - **Custom vocabulary** support | ||
|
|
||
| For more information, visit [AssemblyAI Documentation](https://www.assemblyai.com/docs). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # AssemblyAI STT Transcription Example |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,7 @@ | ||
| # Stream API credentials | ||
| STREAM_API_KEY=your_stream_api_key_here | ||
| STREAM_API_SECRET=your_stream_api_secret_here | ||
| EXAMPLE_BASE_URL=https://pronto.getstream.io | ||
|
|
||
| # AssemblyAI API credentials | ||
| ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.