You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Collection of **self-supervised learning** (SSL) methods for **speaker verification** (SV).
13
+
**sslsv** is a PyTorch-based Deep Learning framework consisting of a collection of **Self-Supervised Learning** (SSL) methods for learning speaker representations applicable to different speaker-related downstream tasks, notably **Speaker Verification** (SV).
14
+
15
+
Our aim is to: **(1) provide self-supervised SOTA methods** by porting algorithms from the computer vision domain; and **(2) evaluate them in a comparable environment**.
16
+
17
+
---
18
+
19
+
## News
20
+
21
+
***April 2024** – :clap: Introduction of new various methods and complete refactoring (v2.0).
22
+
***June 2022** – :stars: First release of sslsv (v1.0).
23
+
24
+
---
4
25
5
-
## Methods
26
+
## Features
6
27
7
-
### Encoders
28
+
**General**
29
+
30
+
-**Data**: supervised and self-supervised datasets + augmentation (noise and reverberation)
31
+
-**Training**: CPU / multi-GPU (DP and DDP), resuming, early stopping, tensorboard, wandb, ...
32
+
-**Evaluation**: speaker verification (cosine and PLDA) and classification (emotion, language, ...)
33
+
-**Notebooks**: DET curve, scores distribution, t-SNE on embeddings, ...
34
+
-**Misc**: scalable config, typing, documentation and tests
35
+
36
+
<details>
37
+
<summary><b>Encoders</b></summary>
8
38
9
39
-**TDNN** (`sslsv.encoders.TDNN`)
10
40
X-vectors: Robust dnn embeddings for speaker recognition ([PDF](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf))
@@ -21,8 +51,10 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
21
51
-**ECAPA-TDNN** (`sslsv.encoders.ECAPATDNN`)
22
52
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification ([PDF](https://arxiv.org/abs/2005.07143))
Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning ([arXiv](https://arxiv.org/abs/2207.05506))
117
+
*Théo Lepage, Réda Dehak*
118
+
119
+
-**SimCLR Custom** (`sslsv.methods.SimCLRCustom`)
120
+
Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification ([arXiv](https://arxiv.org/abs/2306.03664))
121
+
*Théo Lepage, Réda Dehak*
122
+
123
+
</details>
124
+
125
+
126
+
---
127
+
128
+
## Requirements
129
+
130
+
sslsv runs on Python 3.8 with the following dependencies.
131
+
132
+
| Module | Versions |
133
+
|-----------------------|:---------:|
134
+
| torch | >= 1.11.0 |
135
+
| torchaudio | >= 0.11.0 |
136
+
| numpy | * |
137
+
| pandas | * |
138
+
| soundfile | * |
139
+
| scikit-learn | * |
140
+
| speechbrain | * |
141
+
| tensorboard | * |
142
+
| wandb | * |
143
+
| ruamel.yaml | * |
144
+
| dacite | * |
145
+
| prettyprinter | * |
146
+
| tqdm | * |
147
+
148
+
**Note**: developers will also need `pre-commit` and `twine` to work on this project.
149
+
150
+
---
74
151
75
152
## Datasets
76
153
@@ -90,7 +167,14 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
90
167
-[MUSAN](http://www.openslr.org/17/)
91
168
-[Room Impulse Response and Noise Database](https://www.openslr.org/28/)
92
169
93
-
Data used for main experiments (conducted on VoxCeleb1 and VoxCeleb2 + data-augmentation) can be automatically downloaded, extracted and prepared using `utils/prepare_voxceleb.py` and `utils/prepare_augmentation.py`. The resulting `data` folder shoud have the following structure:
170
+
Data used for main experiments (conducted on VoxCeleb1 and VoxCeleb2 + data-augmentation) can be automatically downloaded, extracted and prepared using the following scripts.
The resulting `data` folder shoud have the structure presented below.
94
178
95
179
```
96
180
data
@@ -106,39 +190,99 @@ data
106
190
└── voxceleb2_train.csv
107
191
```
108
192
109
-
Other datasets have to be manually downloaded and extracted but their train and trials *(only for speaker verification)*files can be created using the corresponding script from the `utils` folder.
193
+
Other datasets have to be manually downloaded and extracted but their train and trials files can be created using the corresponding scripts from the `tools/prepare_data/` folder.
110
194
111
-
<details>
112
-
<summary>Example format of a train file</summary>
113
-
`voxceleb1_train.csv`
195
+
- Example format of a train file (`voxceleb1_train.csv`)
114
196
```
115
197
File,Speaker
116
198
voxceleb1/id10001/1zcIwhmdeo4/00001.wav,id10001
117
199
...
118
200
voxceleb1/id11251/s4R4hvqrhFw/00009.wav,id11251
119
201
```
120
-
</details>
121
202
122
-
<details>
123
-
<summary>Example format of a trials file</summary>
124
-
`voxceleb1_test_O`
203
+
- Example format of a trials file (`voxceleb1_test_O`)
sslsv contains third-party components and code adapted from other open-source projects, including: [voxceleb_trainer](https://github.com/clovaai/voxceleb_trainer), [voxceleb_unsupervised](https://github.com/joonson/voxceleb_unsupervised) and [solo-learn](https://github.com/vturrisi/solo-learn).
259
+
260
+
---
261
+
262
+
## Citations
263
+
264
+
If you use sslsv, please consider starring this repository on GitHub and citing one the following papers.
Some parts of the code (data preparation, data augmentation and model evaluation) were adapted from [VoxCeleb trainer](https://github.com/clovaai/voxceleb_trainer) repository.
288
+
This project is released under the [MIT License](https://github.com/theolepage/sslsv/blob/main/LICENSE.md).
0 commit comments