GitHub - WhissleAI/visual_speech_recognition: Visual aware Speech Recognition

Introduction

This repo is made for updating GSoC 2024 project proposal with some helper codes that I for the project proposed.
Please download relevant data of audioset from the link
Download samples of People's Speech Dataset from here link
Note that you can also use utils.ipynb as a scrip to download these datasets.
Other tools that I have tried for the proposal include - yt-dlp, shot_detection, video_llava

/m/0l14jd: Choir /m/0kpv1t: Music genre /m/074ft: Song

/m/09x0r: Speech /m/02rtxlg: Whispering /m/015lz1: Singing /t/dd00089: Miscellaneous sources

/m/09l8g: Human voice -> Note it is better to add it later for some of them which are not speech.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
mixed_samples		mixed_samples
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
alignment.ipynb		alignment.ipynb
analysis_audioset.ipynb		analysis_audioset.ipynb
downloader.py		downloader.py
downloader_parallelised.py		downloader_parallelised.py
get_wer_for_mix_ratios.py		get_wer_for_mix_ratios.py
infer_on_video.py		infer_on_video.py
make_manifest.py		make_manifest.py
mixer_new.py		mixer_new.py
utils.ipynb		utils.ipynb