A Node.js + JavaScript API service for audio transcription with MongoDB Atlas storage and Azure Speech-to-Text integration.
- Features
- Tech Stack
- Project Structure
- Setup & Installation
- API Endpoints
- MongoDB Indexing Strategy
- Scalability Design
- Assumptions
- Production Improvements
- Testing
- β
POST
/transcription- Basic transcription with mock audio download - β
POST
/azure-transcription- Azure Speech-to-Text integration with retry logic - β
GET
/transcriptions- Fetch transcriptions from last 30 days - β MongoDB Atlas storage with optimized indexing
- β Native MongoDB driver (no ODM)
- β Environment variable configuration
- β Error handling and retry mechanisms
- β Exponential backoff for failed requests
- β Multi-language support (Azure)
- Runtime: Node.js
- Language: JavaScript (ES6+)
- Framework: Express.js
- Database: MongoDB (Native MongoDB Driver with Atlas)
- External API: Azure Cognitive Services Speech SDK
- Environment: dotenv
VoiceOwlAssignment1/
βββ src/
β βββ config/
β β βββ database.js # MongoDB connection (Native Driver)
β βββ controllers/
β β βββ transcriptionController.js # Request handlers
β βββ routes/
β β βββ transcriptionRoutes.js # API routes
β βββ services/
β β βββ audioService.js # Audio download & basic transcription
β β βββ azureSpeechService.js # Azure Speech integration
β β βββ transcriptionService.js # Business logic
β βββ server.js # Express app setup
βββ frontend/ # React frontend (optional)
β βββ src/
β β βββ components/
β β β βββ TranscriptionForm.jsx
β β β βββ TranscriptionList.jsx
β β βββ App.jsx
β β βββ main.jsx
β βββ package.json
βββ package.json
βββ README.md
- Node.js (v18 or higher)
- MongoDB (local installation or MongoDB Atlas account)
- (Optional) Azure Speech Service credentials
-
Install dependencies:
npm install
-
Set up environment variables:
cp .env.example .env
Edit
.envand configure:PORT=3000 MONGODB_URI=mongodb://localhost:27017/voiceowl AZURE_SPEECH_KEY=your_key_here AZURE_SPEECH_REGION=your_region_here
-
Start the server:
npm start
Or for development with auto-reload:
npm run dev
-
Verify the server is running:
curl http://localhost:3000/health
Creates a basic transcription from an audio URL.
Request:
{
"audioUrl": "https://example.com/sample.mp3"
}Response:
{
"id": "507f1f77bcf86cd799439011",
"message": "Transcription created successfully"
}Creates a transcription using Azure Speech-to-Text.
Request:
{
"audioUrl": "https://example.com/sample.mp3",
"language": "en-US"
}Supported Languages: en-US, fr-FR, es-ES, de-DE, etc.
Response:
{
"id": "507f1f77bcf86cd799439011",
"message": "Azure transcription created successfully"
}Fetches all transcriptions created in the last 30 days.
Response:
{
"count": 2,
"transcriptions": [
{
"id": "507f1f77bcf86cd799439011",
"audioUrl": "https://example.com/audio.mp3",
"transcription": "transcribed text",
"source": "basic",
"createdAt": "2024-01-15T10:30:00.000Z"
}
]
}Health check endpoint.
Response:
{
"status": "ok",
"message": "VoiceOwl Transcription API is running"
}-
createdAt(descending): Single field index for efficient date-based queriesTranscriptionSchema.index({ createdAt: -1 });
-
audioUrl: Index for faster lookups by URLaudioUrl: { type: String, index: true }
-
Compound Index:
{ source: 1, createdAt: -1 }for filtering by source and date
For a dataset with 100M+ records, the following indexing strategy is recommended:
// Compound index optimized for date range queries
db.transcriptions.createIndex({ createdAt: -1 }, {
name: "createdAt_desc_idx",
background: true
});Why this index:
- The query filters by
createdAt >= thirtyDaysAgoand sorts bycreatedAt: -1 - A descending index on
createdAtallows MongoDB to:- Quickly find documents within the date range
- Return results in sorted order without an additional sort operation
- Use index-only queries when possible
-
Partial Index (if most queries are for recent data):
db.transcriptions.createIndex( { createdAt: -1 }, { partialFilterExpression: { createdAt: { $gte: new Date(Date.now() - 90*24*60*60*1000) } }, name: "recent_transcriptions_idx" } );
-
TTL Index (for automatic cleanup of old data):
db.transcriptions.createIndex( { createdAt: 1 }, { expireAfterSeconds: 2592000 } // 30 days in seconds );
-
Compound Index for Source Filtering:
db.transcriptions.createIndex( { source: 1, createdAt: -1 }, { name: "source_createdAt_idx" } );
Performance Impact:
- Without index: Full collection scan (O(n)) - could take minutes
- With index: Index scan (O(log n)) - milliseconds to seconds
- Estimated query time for 100M records: < 100ms with proper index
To handle 10k+ concurrent requests, the following architectural changes are recommended:
- Deploy multiple instances behind a load balancer (e.g., AWS ALB, NGINX)
- Use containerization (Docker) for consistent deployments
- Implement auto-scaling based on CPU/memory metrics
- Impact: Distributes load across multiple servers
- Implement a queue system (RabbitMQ, AWS SQS, or Redis Queue)
- Move transcription processing to background workers
- API returns immediately with a job ID
- Workers process transcriptions asynchronously
- Impact: Prevents request timeouts, improves user experience
Example Flow:
Client β API β Queue β Worker β MongoDB
β
Return job_id
- Add Redis for caching frequently accessed transcriptions
- Cache recent transcriptions (last 30 days) with TTL
- Cache Azure API responses to reduce external API calls
- Impact: Reduces database load and API latency
- Use MongoDB replica sets for read scaling
- Implement read replicas for GET requests
- Connection pooling (already handled by Mongoose)
- Impact: Distributes read load, improves availability
- Implement rate limiting per client/IP
- Use middleware like
express-rate-limit - Impact: Prevents abuse, ensures fair resource usage
- Add logging (Winston, Pino)
- Implement APM (Application Performance Monitoring)
- Set up alerts for error rates and latency
- Impact: Early detection of bottlenecks
- Phase 1 (Immediate): Load balancer + auto-scaling + connection pooling
- Phase 2 (Short-term): Message queue + background workers
- Phase 3 (Medium-term): Caching layer + read replicas
- Phase 4 (Long-term): Advanced monitoring + optimization
Expected Capacity:
- Current: ~100-500 concurrent requests
- With Phase 1: ~2k-5k concurrent requests
- With Phase 2: ~10k+ concurrent requests
- With Phase 3: ~50k+ concurrent requests
- Audio Download: Currently mocked. In production, would use
axiosornode-fetchto download files. - Audio Format: Assumes audio files are in a format supported by Azure Speech SDK (WAV, MP3, etc.).
- File Size: No explicit size limits implemented. Production should enforce limits (e.g., 100MB max).
- Authentication: No authentication/authorization implemented. Production should add JWT/OAuth.
- MongoDB: Assumes MongoDB is accessible and properly configured.
- Azure Credentials: Service gracefully degrades to mock if credentials are missing.
- Error Handling: Basic error handling implemented. Production should have more granular error types.
- Add authentication/authorization (JWT, OAuth2)
- Implement rate limiting per user/IP
- Add input validation and sanitization
- Use HTTPS only
- Implement CORS policies
- Add request size limits
- Implement Redis caching
- Add database connection pooling optimization
- Implement CDN for static assets (if any)
- Add compression middleware (gzip)
- Optimize MongoDB queries with explain plans
- Add comprehensive error logging (Winston, Sentry)
- Implement health checks for dependencies
- Add circuit breakers for external APIs
- Implement graceful shutdown
- Add database transaction support
- Add APM (New Relic, Datadog)
- Implement structured logging
- Add metrics collection (Prometheus)
- Set up alerting for errors and latency
- Increase test coverage (>80%)
- Add integration tests
- Implement CI/CD pipeline
- Add code linting (ESLint)
- Add pre-commit hooks
- Support for batch transcription
- Webhook notifications for completed transcriptions
- Support for audio streaming
- Add transcription status tracking
- Implement file upload instead of URL-only
A React + JavaScript frontend is included for testing the API.
-
Navigate to frontend directory:
cd frontend -
Install dependencies:
npm install
-
Start frontend dev server:
npm run dev
-
Open browser: Navigate to
http://localhost:3001
The frontend provides:
- Form to create basic and Azure transcriptions
- Language selector for Azure transcriptions
- List view of all transcriptions from last 30 days
- Real-time updates after creating transcriptions
See frontend/README.md for more details.
# Create basic transcription
curl -X POST http://localhost:3000/transcription \
-H "Content-Type: application/json" \
-d '{"audioUrl": "https://example.com/audio.mp3"}'
# Create Azure transcription
curl -X POST http://localhost:3000/azure-transcription \
-H "Content-Type: application/json" \
-d '{"audioUrl": "https://example.com/audio.mp3", "language": "en-US"}'
# Get recent transcriptions
curl http://localhost:3000/transcriptions// Basic transcription
const response = await fetch('http://localhost:3000/transcription', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ audioUrl: 'https://example.com/audio.mp3' })
});
const data = await response.json();
console.log('Transcription ID:', data.id);ISC
VoiceOwl Developer Evaluation Task
Note: This is a demonstration project. For production use, implement the improvements listed in the "Production Improvements" section.