This file describes how to run the diarization implementation of this project. In order to be able to run this you already need to have trained model weights from the sid implementation. So go and look at the readme file of that folder first.
For the execution of this folder the run.sh
script is used. It will first crate a fake diarization data set based on voxceleb. Then it will crate mfccs for it, and run the diari_extract.py
from the sid implementation on them in order to get the x-vectors. Eventually it performs a clustering task before analyzing the result.
You can read more about this in my theses. (Link here if not write me in the issues).
All the important scripts can be find in the toys
directory.
The local
directory contains scripts form the kaldi repository.
- cluster_xvecs.py performs the clustering using the extracted x-vectors.
- colour.py prints the hypothesis and reference labels side by side colorized. However the colors might not fit well as the hungarian algorithm is not used in this script to find the best match.
- compile_diarization_set.py uses the JSON file generated by
generate_diarization_set.py
to crate the actual audio files for the diarization set. - compute_der.py calculates the DER using pyanote.metrics
- dia2srt.py This uses the generated hypothesis labels and a dummy reference file to generate srt subtitles depicting a speaker for a given point in time. ATTENTION This script does not take the VAD into account therefore silence will be labled as the NEXT speaker talking.
- gen_xvec_lblb.py generates the reference x-vector labels.
- make_utt2dur.sh generates a file depicting the duration of each utterance. (This is required for the generation of the fake diarization set. After the creation the utt2dur file is not required anymore.)
- order_by_key.py sort by key.
- vis_compare.py is the same as
colour.py
just without color (I think).