Skip to content

WhissleAI/visual_speech_recognition

Repository files navigation

Introduction

  • This repo is made for updating GSoC 2024 project proposal with some helper codes that I for the project proposed.
  • Please download relevant data of audioset from the link
  • Download samples of People's Speech Dataset from here link
  • Note that you can also use utils.ipynb as a scrip to download these datasets.
  • Other tools that I have tried for the proposal include - yt-dlp, shot_detection, video_llava

Videos with these Labels and childs of these to be avoided

/m/0l14jd: Choir /m/0kpv1t: Music genre /m/074ft: Song

/m/09x0r: Speech /m/02rtxlg: Whispering /m/015lz1: Singing /t/dd00089: Miscellaneous sources

/m/09l8g: Human voice -> Note it is better to add it later for some of them which are not speech.

About

Visual aware Speech Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •