Skip to content
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
164 changes: 164 additions & 0 deletions examples/stt_assemblyai_transcription/EXAMPLE_SUMMARY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# AssemblyAI STT Example Summary
Comment thread
d3xvn marked this conversation as resolved.
Outdated

## Overview

This example demonstrates real-time speech-to-text transcription in Stream video calls using the AssemblyAI plugin. It's designed to be a drop-in replacement for the Deepgram example, showing how easy it is to switch between different STT providers in the GetStream ecosystem.

## What This Example Provides

### 🎯 **Core Functionality**
- **Real-time transcription bot** that joins Stream video calls
- **Live audio processing** with AssemblyAI's streaming API
- **Browser interface** for users to join calls
- **Terminal output** showing transcripts with timestamps

### 🔧 **Technical Features**
- **AssemblyAI integration** using the custom plugin
- **WebRTC audio capture** from Stream calls
- **Event-driven architecture** for real-time processing
- **Error handling** and graceful cleanup
- **User management** with automatic cleanup

### 📊 **Transcription Features**
- **Partial transcripts** for immediate feedback
- **Final transcripts** with confidence scores
- **Automatic punctuation** for readability
- **Utterance detection** for natural speech segmentation
- **Multi-language support** (configurable)

## Comparison with Deepgram Example

| Feature | Deepgram Example | AssemblyAI Example |
|---------|------------------|-------------------|
| **STT Provider** | Deepgram | AssemblyAI |
| **API Integration** | Deepgram SDK | AssemblyAI SDK |
| **Audio Format** | PCM 16-bit | PCM 16-bit |
| **Sample Rate** | Configurable | Configurable (default: 48kHz) |
| **Language Support** | Multi-language | Multi-language |
| **Real-time** | ✅ Yes | ✅ Yes |
| **Partial Results** | ✅ Yes | ✅ Yes |
| **Confidence Scores** | ✅ Yes | ✅ Yes |
| **Error Handling** | ✅ Yes | ✅ Yes |

## Architecture

```
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Stream Call │───▶│ AssemblyAI STT │───▶│ Terminal │
│ │ │ Plugin │ │ Output │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │
▼ ▼
┌─────────────────┐ ┌──────────────────┐
│ Browser UI │ │ Audio Stream │
│ (User Join) │ │ Processing │
└─────────────────┘ └──────────────────┘
```

## Key Components

### 1. **Main Application (`main.py`)**
- Call creation and management
- User authentication and tokens
- Browser interface setup
- Event handler registration

### 2. **AssemblyAI STT Plugin**
- Real-time audio processing
- Streaming API integration
- Event emission for transcripts
- Error handling and recovery

### 3. **Stream Integration**
- WebRTC connection management
- Audio track capture
- Participant management
- Call lifecycle handling

## Usage Scenarios

### 🎙️ **Live Meeting Transcription**
- Real-time captions during video calls
- Meeting minutes generation
- Accessibility support for hearing-impaired users

### 📝 **Content Creation**
- Podcast transcription
- Interview recording
- Educational content processing

### 🔍 **Quality Assurance**
- Call center monitoring
- Training session review
- Compliance documentation

## Configuration Options

The example is highly configurable through the `AssemblyAISTT` constructor:

```python
stt = AssemblyAISTT(
sample_rate=48000, # Audio quality
language="en", # Language selection
interim_results=True, # Real-time feedback
enable_partials=True, # Partial transcripts
enable_automatic_punctuation=True, # Auto-punctuation
enable_utterance_end_detection=True, # Speech segmentation
)
```

## Performance Characteristics

- **Latency**: Low-latency real-time processing
- **Accuracy**: High-quality transcription with confidence scoring
- **Scalability**: Handles multiple participants simultaneously
- **Reliability**: Automatic error recovery and connection management

## Extensibility

This example serves as a foundation for building more complex applications:

- **Multi-language support** for international teams
- **Custom vocabulary** for domain-specific terms
- **Speaker identification** for multi-participant calls
- **Analytics integration** for usage metrics
- **Webhook integration** for external systems

## Getting Started

1. **Install dependencies**: `uv sync`
2. **Configure environment**: Copy `env.example` to `.env`
3. **Add API keys**: Stream and AssemblyAI credentials
4. **Run the example**: `uv run main.py`
5. **Join the call**: Browser will open automatically
6. **Start speaking**: Watch real-time transcription

## Troubleshooting

### Common Issues
- **API key errors**: Verify AssemblyAI credentials
- **Audio not detected**: Check microphone permissions
- **Connection failures**: Verify internet and Stream credentials
- **Import errors**: Ensure all dependencies are installed

### Debug Mode
Enable verbose logging by modifying the logging level in `main.py`:
```python
logging.basicConfig(level=logging.DEBUG, format="%(asctime)s %(levelname)s %(message)s")
```

## Next Steps

After running this example successfully:

1. **Customize the configuration** for your use case
2. **Integrate with your application** using the plugin directly
3. **Explore advanced features** like custom models and vocabulary
4. **Build production applications** with proper error handling and monitoring

## Support Resources

- **AssemblyAI Documentation**: https://www.assemblyai.com/docs
- **GetStream Documentation**: https://getstream.io/docs
- **Plugin Source**: `getstream/plugins/assemblyai/`
- **Example Source**: `examples/stt_assemblyai_transcription/`
89 changes: 89 additions & 0 deletions examples/stt_assemblyai_transcription/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
# Stream + AssemblyAI STT Example

This example demonstrates how to build a real-time transcription bot that joins a Stream video call and transcribes speech using AssemblyAI's Speech-to-Text API.

## What it does

- 🤖 Creates a transcription bot that joins a Stream video call
- 🌐 Opens a browser interface for users to join the call
- 🎙️ Transcribes speech in real-time using AssemblyAI STT
- 📝 Displays transcriptions with timestamps and confidence scores in the terminal

## Prerequisites

1. **Stream Account**: Get your API credentials from [Stream Dashboard](https://dashboard.getstream.io)
2. **AssemblyAI Account**: Get your API key from [AssemblyAI Console](https://www.assemblyai.com/)
3. **Python 3.10+**: Required for running the example

## Installation

You can use your preferred package manager, but we recommend [`uv`](https://docs.astral.sh/uv/).

1. **Navigate to this directory:**
```bash
cd examples/stt_assemblyai_transcription
```

2. **Install dependencies:**
```bash
uv sync
```

3. **Set up environment variables:**
Rename `env.example` to `.env` and fill in your actual credentials.

## Usage

Run the example:
```bash
uv run main.py
```

## Configuration Options

You can customize the AssemblyAI STT settings in the `main.py` file:

```python
stt = AssemblyAISTT(
sample_rate=48000, # Audio sample rate
language="en", # Language code
interim_results=True, # Enable interim results
enable_partials=True, # Enable partial transcripts
enable_automatic_punctuation=True, # Auto-punctuation
enable_utterance_end_detection=True, # Utterance detection
)
```

## Features

- **Real-time transcription** with low latency
- **Partial transcripts** for immediate feedback
- **Automatic punctuation** for better readability
- **Utterance end detection** for natural speech segmentation
- **Multi-language support** (change the `language` parameter)
- **Confidence scoring** for transcription quality

## How it works

1. **Call Setup**: Creates a Stream video call with unique IDs
2. **Bot Joins**: A transcription bot joins the call as a participant
3. **Audio Processing**: Captures audio from all participants
4. **Real-time Transcription**: Sends audio to AssemblyAI for processing
5. **Results Display**: Shows transcripts in the terminal with timestamps

## Troubleshooting

- **No audio detected**: Ensure your microphone is working and permissions are granted
- **API errors**: Check your AssemblyAI API key and account status
- **Connection issues**: Verify your internet connection and Stream credentials

## AssemblyAI Features

AssemblyAI provides high-quality transcription with:
- **Nova-2 model** for best accuracy
- **Real-time streaming** for low latency
- **Automatic language detection** support
- **Speaker diarization** capabilities
- **Custom vocabulary** support

For more information, visit [AssemblyAI Documentation](https://www.assemblyai.com/docs).
1 change: 1 addition & 0 deletions examples/stt_assemblyai_transcription/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# AssemblyAI STT Transcription Example
7 changes: 7 additions & 0 deletions examples/stt_assemblyai_transcription/env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Stream API credentials
STREAM_API_KEY=your_stream_api_key_here
STREAM_API_SECRET=your_stream_api_secret_here
EXAMPLE_BASE_URL=https://pronto.getstream.io

# AssemblyAI API credentials
ASSEMBLYAI_API_KEY=your_assemblyai_api_key_here
Loading