VoiceOwl Transcription API Service

A Node.js + JavaScript API service for audio transcription with MongoDB Atlas storage and Azure Speech-to-Text integration.

📋 Table of Contents

Features
Tech Stack
Project Structure
Setup & Installation
API Endpoints
MongoDB Indexing Strategy
Scalability Design
Assumptions
Production Improvements
Testing

✨ Features

✅ POST /transcription - Basic transcription with mock audio download
✅ POST /azure-transcription - Azure Speech-to-Text integration with retry logic
✅ GET /transcriptions - Fetch transcriptions from last 30 days
✅ MongoDB Atlas storage with optimized indexing
✅ Native MongoDB driver (no ODM)
✅ Environment variable configuration
✅ Error handling and retry mechanisms
✅ Exponential backoff for failed requests
✅ Multi-language support (Azure)

🛠 Tech Stack

Runtime: Node.js
Language: JavaScript (ES6+)
Framework: Express.js
Database: MongoDB (Native MongoDB Driver with Atlas)
External API: Azure Cognitive Services Speech SDK
Environment: dotenv

📁 Project Structure

VoiceOwlAssignment1/
├── src/
│   ├── config/
│   │   └── database.js          # MongoDB connection (Native Driver)
│   ├── controllers/
│   │   └── transcriptionController.js  # Request handlers
│   ├── routes/
│   │   └── transcriptionRoutes.js      # API routes
│   ├── services/
│   │   ├── audioService.js      # Audio download & basic transcription
│   │   ├── azureSpeechService.js # Azure Speech integration
│   │   └── transcriptionService.js     # Business logic
│   └── server.js                # Express app setup
├── frontend/                    # React frontend (optional)
│   ├── src/
│   │   ├── components/
│   │   │   ├── TranscriptionForm.jsx
│   │   │   └── TranscriptionList.jsx
│   │   ├── App.jsx
│   │   └── main.jsx
│   └── package.json
├── package.json
└── README.md

🚀 Setup & Installation

Prerequisites

Node.js (v18 or higher)
MongoDB (local installation or MongoDB Atlas account)
(Optional) Azure Speech Service credentials

Installation Steps

Install dependencies:
```
npm install
```

Set up environment variables:

cp .env.example .env

Edit .env and configure:

PORT=3000
MONGODB_URI=mongodb://localhost:27017/voiceowl
AZURE_SPEECH_KEY=your_key_here
AZURE_SPEECH_REGION=your_region_here

Start the server:
```
npm start
```
Or for development with auto-reload:
```
npm run dev
```
Verify the server is running:
```
curl http://localhost:3000/health
```

📡 API Endpoints

POST `/transcription`

Creates a basic transcription from an audio URL.

Request:

{
  "audioUrl": "https://example.com/sample.mp3"
}

Response:

{
  "id": "507f1f77bcf86cd799439011",
  "message": "Transcription created successfully"
}

POST `/azure-transcription`

Creates a transcription using Azure Speech-to-Text.

Request:

{
  "audioUrl": "https://example.com/sample.mp3",
  "language": "en-US"
}

Supported Languages: en-US, fr-FR, es-ES, de-DE, etc.

Response:

{
  "id": "507f1f77bcf86cd799439011",
  "message": "Azure transcription created successfully"
}

GET `/transcriptions`

Fetches all transcriptions created in the last 30 days.

Response:

{
  "count": 2,
  "transcriptions": [
    {
      "id": "507f1f77bcf86cd799439011",
      "audioUrl": "https://example.com/audio.mp3",
      "transcription": "transcribed text",
      "source": "basic",
      "createdAt": "2024-01-15T10:30:00.000Z"
    }
  ]
}

GET `/health`

Health check endpoint.

Response:

{
  "status": "ok",
  "message": "VoiceOwl Transcription API is running"
}

🗄️ MongoDB Indexing Strategy

Current Indexes

createdAt (descending): Single field index for efficient date-based queries
```
TranscriptionSchema.index({ createdAt: -1 });
```
audioUrl: Index for faster lookups by URL
```
audioUrl: { type: String, index: true }
```
Compound Index: { source: 1, createdAt: -1 } for filtering by source and date

Indexing for 100M+ Records

For a dataset with 100M+ records, the following indexing strategy is recommended:

Primary Index for GET /transcriptions Query

// Compound index optimized for date range queries
db.transcriptions.createIndex({ createdAt: -1 }, {
  name: "createdAt_desc_idx",
  background: true
});

Why this index:

The query filters by createdAt >= thirtyDaysAgo and sorts by createdAt: -1
A descending index on createdAt allows MongoDB to:
- Quickly find documents within the date range
- Return results in sorted order without an additional sort operation
- Use index-only queries when possible

Additional Optimizations

Partial Index (if most queries are for recent data):

db.transcriptions.createIndex(
  { createdAt: -1 },
  { 
    partialFilterExpression: { createdAt: { $gte: new Date(Date.now() - 90*24*60*60*1000) } },
    name: "recent_transcriptions_idx"
  }
);

TTL Index (for automatic cleanup of old data):

db.transcriptions.createIndex(
  { createdAt: 1 },
  { expireAfterSeconds: 2592000 } // 30 days in seconds
);

Compound Index for Source Filtering:

db.transcriptions.createIndex(
  { source: 1, createdAt: -1 },
  { name: "source_createdAt_idx" }
);

Performance Impact:

Without index: Full collection scan (O(n)) - could take minutes
With index: Index scan (O(log n)) - milliseconds to seconds
Estimated query time for 100M records: < 100ms with proper index

🚀 Scalability Design

To handle 10k+ concurrent requests, the following architectural changes are recommended:

1. Horizontal Scaling with Load Balancing

Deploy multiple instances behind a load balancer (e.g., AWS ALB, NGINX)
Use containerization (Docker) for consistent deployments
Implement auto-scaling based on CPU/memory metrics
Impact: Distributes load across multiple servers

2. Message Queue for Async Processing

Implement a queue system (RabbitMQ, AWS SQS, or Redis Queue)
Move transcription processing to background workers
API returns immediately with a job ID
Workers process transcriptions asynchronously
Impact: Prevents request timeouts, improves user experience

Example Flow:

Client → API → Queue → Worker → MongoDB
         ↓
    Return job_id

3. Caching Layer

Add Redis for caching frequently accessed transcriptions
Cache recent transcriptions (last 30 days) with TTL
Cache Azure API responses to reduce external API calls
Impact: Reduces database load and API latency

4. Database Optimization

Use MongoDB replica sets for read scaling
Implement read replicas for GET requests
Connection pooling (already handled by Mongoose)
Impact: Distributes read load, improves availability

5. API Rate Limiting

Implement rate limiting per client/IP
Use middleware like express-rate-limit
Impact: Prevents abuse, ensures fair resource usage

6. Monitoring & Observability

Add logging (Winston, Pino)
Implement APM (Application Performance Monitoring)
Set up alerts for error rates and latency
Impact: Early detection of bottlenecks

Implementation Priority

Phase 1 (Immediate): Load balancer + auto-scaling + connection pooling
Phase 2 (Short-term): Message queue + background workers
Phase 3 (Medium-term): Caching layer + read replicas
Phase 4 (Long-term): Advanced monitoring + optimization

Expected Capacity:

Current: ~100-500 concurrent requests
With Phase 1: ~2k-5k concurrent requests
With Phase 2: ~10k+ concurrent requests
With Phase 3: ~50k+ concurrent requests

🤔 Assumptions

Audio Download: Currently mocked. In production, would use axios or node-fetch to download files.
Audio Format: Assumes audio files are in a format supported by Azure Speech SDK (WAV, MP3, etc.).
File Size: No explicit size limits implemented. Production should enforce limits (e.g., 100MB max).
Authentication: No authentication/authorization implemented. Production should add JWT/OAuth.
MongoDB: Assumes MongoDB is accessible and properly configured.
Azure Credentials: Service gracefully degrades to mock if credentials are missing.
Error Handling: Basic error handling implemented. Production should have more granular error types.

🔧 Production Improvements

Security

Add authentication/authorization (JWT, OAuth2)
Implement rate limiting per user/IP
Add input validation and sanitization
Use HTTPS only
Implement CORS policies
Add request size limits

Performance

Implement Redis caching
Add database connection pooling optimization
Implement CDN for static assets (if any)
Add compression middleware (gzip)
Optimize MongoDB queries with explain plans

Reliability

Add comprehensive error logging (Winston, Sentry)
Implement health checks for dependencies
Add circuit breakers for external APIs
Implement graceful shutdown
Add database transaction support

Monitoring

Add APM (New Relic, Datadog)
Implement structured logging
Add metrics collection (Prometheus)
Set up alerting for errors and latency

Code Quality

Features

Support for batch transcription
Webhook notifications for completed transcriptions
Support for audio streaming
Add transcription status tracking
Implement file upload instead of URL-only

🖥️ Frontend (Optional)

A React + JavaScript frontend is included for testing the API.

Frontend Setup

Navigate to frontend directory:
```
cd frontend
```
Install dependencies:
```
npm install
```
Start frontend dev server:
```
npm run dev
```
Open browser: Navigate to http://localhost:3001

The frontend provides:

Form to create basic and Azure transcriptions
Language selector for Azure transcriptions
List view of all transcriptions from last 30 days
Real-time updates after creating transcriptions

See frontend/README.md for more details.

📝 Example Usage

Using cURL

# Create basic transcription
curl -X POST http://localhost:3000/transcription \
  -H "Content-Type: application/json" \
  -d '{"audioUrl": "https://example.com/audio.mp3"}'

# Create Azure transcription
curl -X POST http://localhost:3000/azure-transcription \
  -H "Content-Type: application/json" \
  -d '{"audioUrl": "https://example.com/audio.mp3", "language": "en-US"}'

# Get recent transcriptions
curl http://localhost:3000/transcriptions

Using JavaScript/TypeScript

// Basic transcription
const response = await fetch('http://localhost:3000/transcription', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ audioUrl: 'https://example.com/audio.mp3' })
});
const data = await response.json();
console.log('Transcription ID:', data.id);

📄 License

ISC

👤 Author

VoiceOwl Developer Evaluation Task

Note: This is a demonstration project. For production use, implement the improvements listed in the "Production Improvements" section.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
frontend		frontend
src		src
.gitignore		.gitignore
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

VoiceOwl Transcription API Service

📋 Table of Contents

✨ Features

🛠 Tech Stack

📁 Project Structure

🚀 Setup & Installation

Prerequisites

Installation Steps

📡 API Endpoints

POST /transcription

POST /azure-transcription

GET /transcriptions

GET /health

🗄️ MongoDB Indexing Strategy

Current Indexes

Indexing for 100M+ Records

Primary Index for GET /transcriptions Query

Additional Optimizations

🚀 Scalability Design

1. Horizontal Scaling with Load Balancing

2. Message Queue for Async Processing

3. Caching Layer

4. Database Optimization

5. API Rate Limiting

6. Monitoring & Observability

Implementation Priority

🤔 Assumptions

🔧 Production Improvements

Security

Performance

Reliability

Monitoring

Code Quality

Features

🖥️ Frontend (Optional)

Frontend Setup

📝 Example Usage

Using cURL

Using JavaScript/TypeScript

📄 License

👤 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

POST `/transcription`

POST `/azure-transcription`

GET `/transcriptions`

GET `/health`

Packages