Skip to content

RoopanshAilusion/VoiceOwlAssignment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

VoiceOwl Transcription API Service

A Node.js + JavaScript API service for audio transcription with MongoDB Atlas storage and Azure Speech-to-Text integration.

πŸ“‹ Table of Contents

✨ Features

  • βœ… POST /transcription - Basic transcription with mock audio download
  • βœ… POST /azure-transcription - Azure Speech-to-Text integration with retry logic
  • βœ… GET /transcriptions - Fetch transcriptions from last 30 days
  • βœ… MongoDB Atlas storage with optimized indexing
  • βœ… Native MongoDB driver (no ODM)
  • βœ… Environment variable configuration
  • βœ… Error handling and retry mechanisms
  • βœ… Exponential backoff for failed requests
  • βœ… Multi-language support (Azure)

πŸ›  Tech Stack

  • Runtime: Node.js
  • Language: JavaScript (ES6+)
  • Framework: Express.js
  • Database: MongoDB (Native MongoDB Driver with Atlas)
  • External API: Azure Cognitive Services Speech SDK
  • Environment: dotenv

πŸ“ Project Structure

VoiceOwlAssignment1/
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ config/
β”‚   β”‚   └── database.js          # MongoDB connection (Native Driver)
β”‚   β”œβ”€β”€ controllers/
β”‚   β”‚   └── transcriptionController.js  # Request handlers
β”‚   β”œβ”€β”€ routes/
β”‚   β”‚   └── transcriptionRoutes.js      # API routes
β”‚   β”œβ”€β”€ services/
β”‚   β”‚   β”œβ”€β”€ audioService.js      # Audio download & basic transcription
β”‚   β”‚   β”œβ”€β”€ azureSpeechService.js # Azure Speech integration
β”‚   β”‚   └── transcriptionService.js     # Business logic
β”‚   └── server.js                # Express app setup
β”œβ”€β”€ frontend/                    # React frontend (optional)
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/
β”‚   β”‚   β”‚   β”œβ”€β”€ TranscriptionForm.jsx
β”‚   β”‚   β”‚   └── TranscriptionList.jsx
β”‚   β”‚   β”œβ”€β”€ App.jsx
β”‚   β”‚   └── main.jsx
β”‚   └── package.json
β”œβ”€β”€ package.json
└── README.md

πŸš€ Setup & Installation

Prerequisites

  • Node.js (v18 or higher)
  • MongoDB (local installation or MongoDB Atlas account)
  • (Optional) Azure Speech Service credentials

Installation Steps

  1. Install dependencies:

    npm install
  2. Set up environment variables:

    cp .env.example .env

    Edit .env and configure:

    PORT=3000
    MONGODB_URI=mongodb://localhost:27017/voiceowl
    AZURE_SPEECH_KEY=your_key_here
    AZURE_SPEECH_REGION=your_region_here
  3. Start the server:

    npm start

    Or for development with auto-reload:

    npm run dev
  4. Verify the server is running:

    curl http://localhost:3000/health

πŸ“‘ API Endpoints

POST /transcription

Creates a basic transcription from an audio URL.

Request:

{
  "audioUrl": "https://example.com/sample.mp3"
}

Response:

{
  "id": "507f1f77bcf86cd799439011",
  "message": "Transcription created successfully"
}

POST /azure-transcription

Creates a transcription using Azure Speech-to-Text.

Request:

{
  "audioUrl": "https://example.com/sample.mp3",
  "language": "en-US"
}

Supported Languages: en-US, fr-FR, es-ES, de-DE, etc.

Response:

{
  "id": "507f1f77bcf86cd799439011",
  "message": "Azure transcription created successfully"
}

GET /transcriptions

Fetches all transcriptions created in the last 30 days.

Response:

{
  "count": 2,
  "transcriptions": [
    {
      "id": "507f1f77bcf86cd799439011",
      "audioUrl": "https://example.com/audio.mp3",
      "transcription": "transcribed text",
      "source": "basic",
      "createdAt": "2024-01-15T10:30:00.000Z"
    }
  ]
}

GET /health

Health check endpoint.

Response:

{
  "status": "ok",
  "message": "VoiceOwl Transcription API is running"
}

πŸ—„οΈ MongoDB Indexing Strategy

Current Indexes

  1. createdAt (descending): Single field index for efficient date-based queries

    TranscriptionSchema.index({ createdAt: -1 });
  2. audioUrl: Index for faster lookups by URL

    audioUrl: { type: String, index: true }
  3. Compound Index: { source: 1, createdAt: -1 } for filtering by source and date

Indexing for 100M+ Records

For a dataset with 100M+ records, the following indexing strategy is recommended:

Primary Index for GET /transcriptions Query

// Compound index optimized for date range queries
db.transcriptions.createIndex({ createdAt: -1 }, {
  name: "createdAt_desc_idx",
  background: true
});

Why this index:

  • The query filters by createdAt >= thirtyDaysAgo and sorts by createdAt: -1
  • A descending index on createdAt allows MongoDB to:
    • Quickly find documents within the date range
    • Return results in sorted order without an additional sort operation
    • Use index-only queries when possible

Additional Optimizations

  1. Partial Index (if most queries are for recent data):

    db.transcriptions.createIndex(
      { createdAt: -1 },
      { 
        partialFilterExpression: { createdAt: { $gte: new Date(Date.now() - 90*24*60*60*1000) } },
        name: "recent_transcriptions_idx"
      }
    );
  2. TTL Index (for automatic cleanup of old data):

    db.transcriptions.createIndex(
      { createdAt: 1 },
      { expireAfterSeconds: 2592000 } // 30 days in seconds
    );
  3. Compound Index for Source Filtering:

    db.transcriptions.createIndex(
      { source: 1, createdAt: -1 },
      { name: "source_createdAt_idx" }
    );

Performance Impact:

  • Without index: Full collection scan (O(n)) - could take minutes
  • With index: Index scan (O(log n)) - milliseconds to seconds
  • Estimated query time for 100M records: < 100ms with proper index

πŸš€ Scalability Design

To handle 10k+ concurrent requests, the following architectural changes are recommended:

1. Horizontal Scaling with Load Balancing

  • Deploy multiple instances behind a load balancer (e.g., AWS ALB, NGINX)
  • Use containerization (Docker) for consistent deployments
  • Implement auto-scaling based on CPU/memory metrics
  • Impact: Distributes load across multiple servers

2. Message Queue for Async Processing

  • Implement a queue system (RabbitMQ, AWS SQS, or Redis Queue)
  • Move transcription processing to background workers
  • API returns immediately with a job ID
  • Workers process transcriptions asynchronously
  • Impact: Prevents request timeouts, improves user experience

Example Flow:

Client β†’ API β†’ Queue β†’ Worker β†’ MongoDB
         ↓
    Return job_id

3. Caching Layer

  • Add Redis for caching frequently accessed transcriptions
  • Cache recent transcriptions (last 30 days) with TTL
  • Cache Azure API responses to reduce external API calls
  • Impact: Reduces database load and API latency

4. Database Optimization

  • Use MongoDB replica sets for read scaling
  • Implement read replicas for GET requests
  • Connection pooling (already handled by Mongoose)
  • Impact: Distributes read load, improves availability

5. API Rate Limiting

  • Implement rate limiting per client/IP
  • Use middleware like express-rate-limit
  • Impact: Prevents abuse, ensures fair resource usage

6. Monitoring & Observability

  • Add logging (Winston, Pino)
  • Implement APM (Application Performance Monitoring)
  • Set up alerts for error rates and latency
  • Impact: Early detection of bottlenecks

Implementation Priority

  1. Phase 1 (Immediate): Load balancer + auto-scaling + connection pooling
  2. Phase 2 (Short-term): Message queue + background workers
  3. Phase 3 (Medium-term): Caching layer + read replicas
  4. Phase 4 (Long-term): Advanced monitoring + optimization

Expected Capacity:

  • Current: ~100-500 concurrent requests
  • With Phase 1: ~2k-5k concurrent requests
  • With Phase 2: ~10k+ concurrent requests
  • With Phase 3: ~50k+ concurrent requests

πŸ€” Assumptions

  1. Audio Download: Currently mocked. In production, would use axios or node-fetch to download files.
  2. Audio Format: Assumes audio files are in a format supported by Azure Speech SDK (WAV, MP3, etc.).
  3. File Size: No explicit size limits implemented. Production should enforce limits (e.g., 100MB max).
  4. Authentication: No authentication/authorization implemented. Production should add JWT/OAuth.
  5. MongoDB: Assumes MongoDB is accessible and properly configured.
  6. Azure Credentials: Service gracefully degrades to mock if credentials are missing.
  7. Error Handling: Basic error handling implemented. Production should have more granular error types.

πŸ”§ Production Improvements

Security

  • Add authentication/authorization (JWT, OAuth2)
  • Implement rate limiting per user/IP
  • Add input validation and sanitization
  • Use HTTPS only
  • Implement CORS policies
  • Add request size limits

Performance

  • Implement Redis caching
  • Add database connection pooling optimization
  • Implement CDN for static assets (if any)
  • Add compression middleware (gzip)
  • Optimize MongoDB queries with explain plans

Reliability

  • Add comprehensive error logging (Winston, Sentry)
  • Implement health checks for dependencies
  • Add circuit breakers for external APIs
  • Implement graceful shutdown
  • Add database transaction support

Monitoring

  • Add APM (New Relic, Datadog)
  • Implement structured logging
  • Add metrics collection (Prometheus)
  • Set up alerting for errors and latency

Code Quality

  • Increase test coverage (>80%)
  • Add integration tests
  • Implement CI/CD pipeline
  • Add code linting (ESLint)
  • Add pre-commit hooks

Features

  • Support for batch transcription
  • Webhook notifications for completed transcriptions
  • Support for audio streaming
  • Add transcription status tracking
  • Implement file upload instead of URL-only

πŸ–₯️ Frontend (Optional)

A React + JavaScript frontend is included for testing the API.

Frontend Setup

  1. Navigate to frontend directory:

    cd frontend
  2. Install dependencies:

    npm install
  3. Start frontend dev server:

    npm run dev
  4. Open browser: Navigate to http://localhost:3001

The frontend provides:

  • Form to create basic and Azure transcriptions
  • Language selector for Azure transcriptions
  • List view of all transcriptions from last 30 days
  • Real-time updates after creating transcriptions

See frontend/README.md for more details.

πŸ“ Example Usage

Using cURL

# Create basic transcription
curl -X POST http://localhost:3000/transcription \
  -H "Content-Type: application/json" \
  -d '{"audioUrl": "https://example.com/audio.mp3"}'

# Create Azure transcription
curl -X POST http://localhost:3000/azure-transcription \
  -H "Content-Type: application/json" \
  -d '{"audioUrl": "https://example.com/audio.mp3", "language": "en-US"}'

# Get recent transcriptions
curl http://localhost:3000/transcriptions

Using JavaScript/TypeScript

// Basic transcription
const response = await fetch('http://localhost:3000/transcription', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ audioUrl: 'https://example.com/audio.mp3' })
});
const data = await response.json();
console.log('Transcription ID:', data.id);

πŸ“„ License

ISC

πŸ‘€ Author

VoiceOwl Developer Evaluation Task


Note: This is a demonstration project. For production use, implement the improvements listed in the "Production Improvements" section.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors