Speech-to-Text API Benchmarking Script

This script benchmarks the response times of four speech-to-text (STT) APIs: Groq, JigsawStack, AssemblyAI, and OpenAI. It runs each API 10 times (configurable) across for audio samples of different lengths and calculates the average response time for each.

Check this out for the full results and breakdown of the benchmark

Benchmark overview

Criteria	JigsawStack	Groq	AssemblyAI	OpenAI
Model	Insanely-fast-whisper	Whisper-large-v3-turbo	Universal-1	Whisper-2
Latency (5s audio)	765ms	631ms	4s	12s
Latency (3m video)	2.7s	3.5s	7.8s	10s
Latency (30m video)	11s	12s	29s	91s
Latency (1hr 35m video)	27s	Error out	42s	Error out
Word Error Rate (WER)	10.30%	12%	8.70%	10.60%
Diarization Support	Yes	No	Yes	No
Timestamp	Sentence level	Sentence level	Word level	Sentence level
Large File	Up to 100MB	Up to 25MB	5GB	Up to 25MB
Automatic	Yes	Yes	Yes	Yes
Streaming Support	No	No	Yes	No
Pricing	$0.05/hr	$0.04/hr	$0.37/hr	$0.36/hr
Best For	Speed, Low cost, Production apps	Low cost and lightweight app	Real-time transcription apps

Prerequisites

Before running this script, ensure you have the following:

Node.js (v16 or higher)
API keys for:
- Groq SDK
- AssemblyAI
- JigsawStack
- OpenAI

Installation

Clone the repository (or download the script if provided directly):
```
git clone https://github.com/JigsawStack/stt-comparison.git
```
Install dependencies:
```
yarn
```
or
```
npm install
```

Configuration

Set API keys** for each of the services:

Use the .env.example file to create a .env file and replace the placeholders with your actual API keys:

Running the Script

To run the benchmarking script, execute the following command:

tsx benchmark.ts

The script will run each API request 10 times (or the specified number of iterations) and print out average response times in milliseconds for each service.

Output

The script logs each API’s average response time. Example output:

Iteration 1
...
Average response time for Groq: 3512.3759947 ms
Average response time for JigsawStack: 2749.9410608999997 ms
Average response time for AssemblyAI: 7808.462181100001 ms
Average response time for Openai:  10407.212865700001 ms

Audio Samples

Here are the audio samples used in the benchmark

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
benchmark.ts		benchmark.ts
benchmark_results.json		benchmark_results.json
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech-to-Text API Benchmarking Script

Benchmark overview

Prerequisites

Installation

Configuration

Set API keys** for each of the services:

Running the Script

Output

Audio Samples

About

Releases

Packages

Contributors 2

Languages

JigsawStack/stt-comparison

Folders and files

Latest commit

History

Repository files navigation

Speech-to-Text API Benchmarking Script

Benchmark overview

Prerequisites

Installation

Configuration

Set API keys** for each of the services:

Running the Script

Output

Audio Samples

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages