This script benchmarks the response times of four speech-to-text (STT) APIs: Groq, JigsawStack, AssemblyAI, and OpenAI. It runs each API 10 times (configurable) across for audio samples of different lengths and calculates the average response time for each.
Check this out for the full results and breakdown of the benchmark
Criteria | JigsawStack | Groq | AssemblyAI | OpenAI |
---|---|---|---|---|
Model | Insanely-fast-whisper | Whisper-large-v3-turbo | Universal-1 | Whisper-2 |
Latency (5s audio) | 765ms | 631ms | 4s | 12s |
Latency (3m video) | 2.7s | 3.5s | 7.8s | 10s |
Latency (30m video) | 11s | 12s | 29s | 91s |
Latency (1hr 35m video) | 27s | Error out | 42s | Error out |
Word Error Rate (WER) | 10.30% | 12% | 8.70% | 10.60% |
Diarization Support | Yes | No | Yes | No |
Timestamp | Sentence level | Sentence level | Word level | Sentence level |
Large File | Up to 100MB | Up to 25MB | 5GB | Up to 25MB |
Automatic | Yes | Yes | Yes | Yes |
Streaming Support | No | No | Yes | No |
Pricing | $0.05/hr | $0.04/hr | $0.37/hr | $0.36/hr |
Best For | Speed, Low cost, Production apps | Low cost and lightweight app | Real-time transcription apps |
Before running this script, ensure you have the following:
- Node.js (v16 or higher)
- API keys for:
- Groq SDK
- AssemblyAI
- JigsawStack
- OpenAI
-
Clone the repository (or download the script if provided directly):
git clone https://github.com/JigsawStack/stt-comparison.git
-
Install dependencies:
yarn
or
npm install
Use the .env.example
file to create a .env
file and replace the placeholders with your actual API keys:
To run the benchmarking script, execute the following command:
tsx benchmark.ts
The script will run each API request 10 times (or the specified number of iterations) and print out average response times in milliseconds for each service.
The script logs each API’s average response time. Example output:
Iteration 1
...
Average response time for Groq: 3512.3759947 ms
Average response time for JigsawStack: 2749.9410608999997 ms
Average response time for AssemblyAI: 7808.462181100001 ms
Average response time for Openai: 10407.212865700001 ms
Here are the audio samples used in the benchmark