Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster RTTMLoader #94

Open
hbredin opened this issue Apr 7, 2023 · 2 comments
Open

Faster RTTMLoader #94

hbredin opened this issue Apr 7, 2023 · 2 comments

Comments

@hbredin
Copy link
Member

hbredin commented Apr 7, 2023

RTTMLoader class is extremely slow for large RTTM files containing annotation of multiple audio files (e.g. VoxCeleb dataset).

We should make it faster!

@hbredin
Copy link
Member Author

hbredin commented Apr 7, 2023

cc @clement-pages

I am not assigning this issue to you but just wanted to let you know that I took note of what we discussed today.

@hbredin
Copy link
Member Author

hbredin commented Apr 11, 2023

I have just pushed two PRs that should make things much faster:

  • this pyannote.database PR relies on vanilla csv library instead of pandas
  • this pyannote.core PR switches from sortedcontainers.SortedDict to vanilla dict in Annotation internals (making Annotation.__init__ orders of magnitude faster).

I still need to make sure those PRs do not break anything but you could already try them on your use case (this requires that you install both pyannote.database and pyannote.core from the corresponding branches).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant