Skip to content

Commit 0c576eb

Browse files
committed
docs(readme): improve readme (logo, news, features, results, citations, license)
1 parent 73fde10 commit 0c576eb

File tree

6 files changed

+217
-37
lines changed

6 files changed

+217
-37
lines changed

README.md

Lines changed: 162 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,40 @@
1+
<p align="center">
2+
<img src="https://i.postimg.cc/CLFZLW7k/sslsv-logo-3-1.png" width=130 />
3+
</p>
4+
5+
<!-- <p align="center">
6+
<img src="https://img.shields.io/badge/License-MIT-green">
7+
<img src="https://img.shields.io/badge/Python-3.8-aff?logo=python">
8+
<img src="https://img.shields.io/badge/PyTorch-1.11.0-blue?logo=pytorch">
9+
</p> -->
10+
111
# sslsv
212

3-
Collection of **self-supervised learning** (SSL) methods for **speaker verification** (SV).
13+
**sslsv** is a PyTorch-based Deep Learning framework consisting of a collection of **Self-Supervised Learning** (SSL) methods for learning speaker representations applicable to different speaker-related downstream tasks, notably **Speaker Verification** (SV).
14+
15+
Our aim is to: **(1) provide self-supervised SOTA methods** by porting algorithms from the computer vision domain; and **(2) evaluate them in a comparable environment**.
16+
17+
---
18+
19+
## News
20+
21+
* **April 2024**:clap: Introduction of new various methods and complete refactoring (v2.0).
22+
* **June 2022**:stars: First release of sslsv (v1.0).
23+
24+
---
425

5-
## Methods
26+
## Features
627

7-
### Encoders
28+
**General**
29+
30+
- **Data**: supervised and self-supervised datasets + augmentation (noise and reverberation)
31+
- **Training**: CPU / multi-GPU (DP and DDP), resuming, early stopping, tensorboard, wandb, ...
32+
- **Evaluation**: speaker verification (cosine and PLDA) and classification (emotion, language, ...)
33+
- **Notebooks**: DET curve, scores distribution, t-SNE on embeddings, ...
34+
- **Misc**: scalable config, typing, documentation and tests
35+
36+
<details>
37+
<summary><b>Encoders</b></summary>
838

939
- **TDNN** (`sslsv.encoders.TDNN`)
1040
X-vectors: Robust dnn embeddings for speaker recognition ([PDF](https://www.danielpovey.com/files/2018_icassp_xvectors.pdf))
@@ -21,8 +51,10 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
2151
- **ECAPA-TDNN** (`sslsv.encoders.ECAPATDNN`)
2252
ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification ([PDF](https://arxiv.org/abs/2005.07143))
2353
*Brecht Desplanques, Jenthe Thienpondt, Kris Demuynck*
54+
</details>
2455

25-
### Methods
56+
<details>
57+
<summary><b>Methods</b></summary>
2658

2759
- **CPC** (`sslsv.methods.CPC`)
2860
Representation Learning with Contrastive Predictive Coding ([arXiv](https://arxiv.org/abs/1807.03748))
@@ -40,6 +72,10 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
4072
Improved Baselines with Momentum Contrastive Learning ([arXiv](https://arxiv.org/abs/2003.04297))
4173
*Xinlei Chen, Haoqi Fan, Ross Girshick, Kaiming He*
4274

75+
- **W-MSE** (`sslsv.methods.WMSE`)
76+
Whitening for Self-Supervised Representation Learning ([arXiv](https://arxiv.org/abs/2007.06346))
77+
*Aleksandr Ermolov, Aliaksandr Siarohin, Enver Sangineto, Nicu Sebe*
78+
4379
- **Barlow Twins** (`sslsv.methods.BarlowTwins`)
4480
Barlow Twins: Self-Supervised Learning via Redundancy Reduction ([arXiv](https://arxiv.org/abs/2103.03230))
4581
*Jure Zbontar, Li Jing, Ishan Misra, Yann LeCun, Stéphane Deny*
@@ -71,6 +107,47 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
71107
- **SwAV** (`sslsv.methods.SwAV`)
72108
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments ([arXiv](https://arxiv.org/abs/2006.09882))
73109
*Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, Armand Joulin*
110+
</details>
111+
112+
<details open>
113+
<summary><b>Methods (ours)</b></summary>
114+
115+
- **Combiner** (`sslsv.methods.Combiner`)
116+
Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning ([arXiv](https://arxiv.org/abs/2207.05506))
117+
*Théo Lepage, Réda Dehak*
118+
119+
- **SimCLR Custom** (`sslsv.methods.SimCLRCustom`)
120+
Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification ([arXiv](https://arxiv.org/abs/2306.03664))
121+
*Théo Lepage, Réda Dehak*
122+
123+
</details>
124+
125+
126+
---
127+
128+
## Requirements
129+
130+
sslsv runs on Python 3.8 with the following dependencies.
131+
132+
| Module | Versions |
133+
|-----------------------|:---------:|
134+
| torch | >= 1.11.0 |
135+
| torchaudio | >= 0.11.0 |
136+
| numpy | * |
137+
| pandas | * |
138+
| soundfile | * |
139+
| scikit-learn | * |
140+
| speechbrain | * |
141+
| tensorboard | * |
142+
| wandb | * |
143+
| ruamel.yaml | * |
144+
| dacite | * |
145+
| prettyprinter | * |
146+
| tqdm | * |
147+
148+
**Note**: developers will also need `pre-commit` and `twine` to work on this project.
149+
150+
---
74151

75152
## Datasets
76153

@@ -90,7 +167,14 @@ Collection of **self-supervised learning** (SSL) methods for **speaker verificat
90167
- [MUSAN](http://www.openslr.org/17/)
91168
- [Room Impulse Response and Noise Database](https://www.openslr.org/28/)
92169

93-
Data used for main experiments (conducted on VoxCeleb1 and VoxCeleb2 + data-augmentation) can be automatically downloaded, extracted and prepared using `utils/prepare_voxceleb.py` and `utils/prepare_augmentation.py`. The resulting `data` folder shoud have the following structure:
170+
Data used for main experiments (conducted on VoxCeleb1 and VoxCeleb2 + data-augmentation) can be automatically downloaded, extracted and prepared using the following scripts.
171+
172+
```bash
173+
python tools/prepare_data/prepare_voxceleb.py data/
174+
python tools/prepare_data/prepare_augmentation.py data/
175+
```
176+
177+
The resulting `data` folder shoud have the structure presented below.
94178

95179
```
96180
data
@@ -106,39 +190,99 @@ data
106190
└── voxceleb2_train.csv
107191
```
108192

109-
Other datasets have to be manually downloaded and extracted but their train and trials *(only for speaker verification)* files can be created using the corresponding script from the `utils` folder.
193+
Other datasets have to be manually downloaded and extracted but their train and trials files can be created using the corresponding scripts from the `tools/prepare_data/` folder.
110194

111-
<details>
112-
<summary>Example format of a train file</summary>
113-
`voxceleb1_train.csv`
195+
- Example format of a train file (`voxceleb1_train.csv`)
114196
```
115197
File,Speaker
116198
voxceleb1/id10001/1zcIwhmdeo4/00001.wav,id10001
117199
...
118200
voxceleb1/id11251/s4R4hvqrhFw/00009.wav,id11251
119201
```
120-
</details>
121202

122-
<details>
123-
<summary>Example format of a trials file</summary>
124-
`voxceleb1_test_O`
203+
- Example format of a trials file (`voxceleb1_test_O`)
125204
```
126205
1 voxceleb1/id10270/x6uYqmx31kE/00001.wav voxceleb1/id10270/8jEAjG6SegY/00008.wav
127206
...
128207
0 voxceleb1/id10309/0cYFdtyWVds/00005.wav voxceleb1/id10296/Y-qKARMSO7k/00001.wav
129208
```
130-
</details>
131209

132-
*Please refer to the associated code if you want further details about data preparation.*
210+
<!-- *Please refer to the associated code if you want further details about data preparation.* -->
211+
212+
---
133213

134214
## Usage
135215

136-
Start self-supervised training with `python train.py configs/vicreg.yml`.
216+
1. **Clone the repository**: `git clone https://github.com/theolepage/sslsv.git`.
217+
2. **Install dependencies**: `pip install -r requirements.txt`.
218+
3. **Start a training** (*2 GPUs*): `./train_ddp.sh 2 <config_path>`.
219+
4. **Evaluate your model** (*2 GPUs*): `./evaluate_ddp.sh 2 <config_path>`.
220+
221+
**Note 1**: with a CPU or a single GPU you can use `sslsv/bin/train.py` and `sslsv/bin/evaluate.py`, respectively.
222+
223+
**Note 2**: alternatively you can install sslsv using `pip install .` and use its modules separately from your code.
224+
225+
### Tensorboard
226+
227+
You can visualize your experiments with `tensorboard --logdir models/your_model/`.
137228

138229
### wandb
139230

140231
Use `wandb online` and `wandb offline` to toggle wandb. To log your experiments you first need to provide your API key with `wandb login API_KEY`.
141232

142-
## Credits
233+
---
234+
235+
## Documentation
236+
237+
*Documentation is currently being developed...*
238+
239+
---
240+
241+
## Results
242+
243+
### SOTA
244+
245+
- **Train set**: VoxCeleb2
246+
- **Evaluation**: SV on VoxCeleb1-O (original) trials
247+
- **Encoder**: Fast ResNet-34
248+
249+
| Method | Model | EER (%) | minDCF (p=0.01) | Checkpoint |
250+
|-------------|:-----------------------------:|:-------:|:----------------:|:-------------:|
251+
| **SimCLR** | `ssl/voxceleb2/simclr/simclr` | - | - | [:link:](...) |
252+
| ... | | | | |
253+
254+
---
255+
256+
## Acknowledgements
257+
258+
sslsv contains third-party components and code adapted from other open-source projects, including: [voxceleb_trainer](https://github.com/clovaai/voxceleb_trainer), [voxceleb_unsupervised](https://github.com/joonson/voxceleb_unsupervised) and [solo-learn](https://github.com/vturrisi/solo-learn).
259+
260+
---
261+
262+
## Citations
263+
264+
If you use sslsv, please consider starring this repository on GitHub and citing one the following papers.
265+
266+
```BibTeX
267+
@InProceedings{lepage2023ExperimentingAdditiveMarginsSSLSV,
268+
author = {Lepage, Théo and Dehak, Réda},
269+
booktitle = {INTERSPEECH},
270+
title = {Experimenting with Additive Margins for Contrastive Self-Supervised Speaker Verification},
271+
year = {2023},
272+
url = {https://www.isca-speech.org/archive/interspeech_2023/lepage23_interspeech.html},
273+
}
274+
275+
@InProceedings{lepage2022LabelEfficientSelfSupervisedSV,
276+
author = {Lepage, Théo and Dehak, Réda},
277+
booktitle = {INTERSPEECH},
278+
title = {Label-Efficient Self-Supervised Speaker Verification With Information Maximization and Contrastive Learning},
279+
year = {2022},
280+
url = {https://www.isca-speech.org/archive/interspeech_2022/lepage22_interspeech.html},
281+
}
282+
```
283+
284+
---
285+
286+
## License
143287

144-
Some parts of the code (data preparation, data augmentation and model evaluation) were adapted from [VoxCeleb trainer](https://github.com/clovaai/voxceleb_trainer) repository.
288+
This project is released under the [MIT License](https://github.com/theolepage/sslsv/blob/main/LICENSE.md).

evaluate_ddp.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
11
#!/bin/bash
22

3-
torchrun --nproc_per_node=2 sslsv/bin/evaluate_distributed.py $@
3+
if [ $# -eq 0 ]; then
4+
echo "Usage: $0 <num_gpus> [args ...]"
5+
exit 1
6+
fi
7+
8+
num_gpus=$1
9+
10+
shift
11+
12+
torchrun --nproc_per_node=$num_gpus sslsv/bin/evaluate_distributed.py "$@"

notebooks/requirements.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
plotnine
2+
plotly
3+
seaborn

requirements.txt

Lines changed: 13 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,19 +1,15 @@
1-
torch==1.11.0
2-
torchaudio==0.11.0
1+
torch>=1.11.0
2+
torchaudio>=0.11.0
33

4-
numpy==1.23.1
5-
pandas==1.4.3
6-
soundfile==0.11.0
7-
scikit-learn==1.0.2
8-
speechbrain==0.5.13
9-
tensorboard==2.10.0
10-
wandb==0.13.3
4+
numpy
5+
pandas
6+
soundfile
7+
scikit-learn
8+
speechbrain
9+
tensorboard
10+
wandb
1111

12-
ruamel.yaml==0.17.21
13-
dacite==1.8.1
14-
prettyprinter==0.18.0
15-
tqdm==4.64.0
16-
17-
plotnine==0.9.0
18-
plotly==5.10.0
19-
seaborn==0.12.0
12+
ruamel.yaml
13+
dacite
14+
prettyprinter
15+
tqdm

requirements_strict.txt

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
torch==1.11.0
2+
torchaudio==0.11.0
3+
4+
numpy==1.23.1
5+
pandas==1.4.3
6+
soundfile==0.11.0
7+
scikit-learn==1.0.2
8+
speechbrain==0.5.13
9+
tensorboard==2.10.0
10+
wandb==0.13.3
11+
12+
ruamel.yaml==0.17.21
13+
dacite==1.8.1
14+
prettyprinter==0.18.0
15+
tqdm==4.64.0
16+
17+
plotnine==0.9.0
18+
plotly==5.10.0
19+
seaborn==0.12.0

train_ddp.sh

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,12 @@
11
#!/bin/bash
22

3-
torchrun --nproc_per_node=2 sslsv/bin/train_distributed.py $@
3+
if [ $# -eq 0 ]; then
4+
echo "Usage: $0 <num_gpus> [args ...]"
5+
exit 1
6+
fi
7+
8+
num_gpus=$1
9+
10+
shift
11+
12+
torchrun --nproc_per_node=$num_gpus sslsv/bin/train_distributed.py "$@"

0 commit comments

Comments
 (0)