Simple example of mpeg4 audio track to text conversion

Uses ffmpeg to cut the audio track from mp4 file, performs speech recognition via Vosk API and Vosk model and returns text as result. A utility for calculating metrics based on the reference text is included.

Requirements

Python 3.6+
ffmpeg

Stack

Installation

This application utilizes ffmpeg to convert .mp4 to .wav. You should have ffmpeg package installed in your system to make mpeg-4 video to text conversion work.

Install requirements

pip install -r requirements.txt

Download Vosk model

Download a model for your language from https://alphacephei.com/vosk/models and put in into ./model directory.

Example:

wget https://alphacephei.com/vosk/models/vosk-model-ru-0.22.zip
unzip vosk-model-ru-0.22.zip
mv vosk-model-ru-0.22 model

Usage

python transcript_mp4.py some_video.mp4

Metrics calculation

python wer.py <hypotesis_text_file> <reference_text_file>

Example:

We have an ideal transcript for this video in russian (./samples/ideal.txt): https://vod-video.rbc.ru/archive/2021/12/02/den1118.folder/telecast_576p.mp4

We have also made a transcript with Vosk model for the same video (./samples/test.txt).

So, we can run the calculation:

python wer.py samples/test.txt samples/ideal.txt

Result:

WER (Words Error Rate): 0.14775815217391305
MER (Match Error Rate): 0.14164767176815368
WIL (Word Information Lost): 0.22181904843819444
WIP (Word Information Preserved): 0.7781809515618056
Hits: 2636
Substitutions: 270
Deletions: 38
Insertions: 127

About the metrics

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
samples		samples
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
transcript_mp4.py		transcript_mp4.py
wer.py		wer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Simple example of mpeg4 audio track to text conversion

Requirements

Stack

Installation

Install requirements

Download Vosk model

Usage

Metrics calculation

Example:

About

Releases

Packages

Languages

RosBusinessConsulting/video-to-text

Folders and files

Latest commit

History

Repository files navigation

Simple example of mpeg4 audio track to text conversion

Requirements

Stack

Installation

Install requirements

Download Vosk model

Usage

Metrics calculation

Example:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages