Diarization

This file describes how to run the diarization implementation of this project. In order to be able to run this you already need to have trained model weights from the sid implementation. So go and look at the readme file of that folder first.

For the execution of this folder the run.sh script is used. It will first crate a fake diarization data set based on voxceleb. Then it will crate mfccs for it, and run the diari_extract.py from the sid implementation on them in order to get the x-vectors. Eventually it performs a clustering task before analyzing the result.

You can read more about this in my theses. (Link here if not write me in the issues).

Scripts and what they are used fore

All the important scripts can be find in the toys directory. The local directory contains scripts form the kaldi repository.

cluster_xvecs.py performs the clustering using the extracted x-vectors.
colour.py prints the hypothesis and reference labels side by side colorized. However the colors might not fit well as the hungarian algorithm is not used in this script to find the best match.
compile_diarization_set.py uses the JSON file generated by generate_diarization_set.py to crate the actual audio files for the diarization set.
compute_der.py calculates the DER using pyanote.metrics
dia2srt.py This uses the generated hypothesis labels and a dummy reference file to generate srt subtitles depicting a speaker for a given point in time. ATTENTION This script does not take the VAD into account therefore silence will be labled as the NEXT speaker talking.
gen_xvec_lblb.py generates the reference x-vector labels.
make_utt2dur.sh generates a file depicting the duration of each utterance. (This is required for the generation of the fake diarization set. After the creation the utt2dur file is not required anymore.)
order_by_key.py sort by key.
vis_compare.py is the same as colour.py just without color (I think).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Diarization

Scripts and what they are used fore

Files

README.md

Latest commit

History

README.md

File metadata and controls

Diarization

Scripts and what they are used fore