- This repo is made for updating GSoC 2024 project proposal with some helper codes that I for the project proposed.
- Please download relevant data of audioset from the link
- Download samples of People's Speech Dataset from here link
- Note that you can also use utils.ipynb as a scrip to download these datasets.
- Other tools that I have tried for the proposal include - yt-dlp, shot_detection, video_llava
/m/0l14jd: Choir /m/0kpv1t: Music genre /m/074ft: Song
/m/09x0r: Speech /m/02rtxlg: Whispering /m/015lz1: Singing /t/dd00089: Miscellaneous sources
/m/09l8g: Human voice -> Note it is better to add it later for some of them which are not speech.