Skip to content

VGGish mel-band frame length/hop size? #1251

Closed Answered by palonso
seunggookim asked this question in Essentia Models
Discussion options

You must be logged in to vote

Hi @seunggookim,
Yes! The frame size is 400 samples (25ms) and the hop is 160 (10ms). Those parameters are inherited from the original implementation and and we hardcode them so that it is not possible to feed the model with incorrect mel-spectrograms.
Maybe you couldn't figure the exact numbers because of the zero-padding (startFromZero=False) introduced by the frameCutter algorithm?

I really like the idea of returning timestamps. Pleese feel free to fill a feature request for it:)

Replies: 1 comment

Comment options

You must be logged in to vote
0 replies
Answer selected by seunggookim
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants