-
Notifications
You must be signed in to change notification settings - Fork 62
Description
Hi 👋
We recently added GLM-ASR to the Open ASR Leaderboard via this PR:
huggingface/open_asr_leaderboard#113
While doing so, we noticed a small delta between the WER numbers reported in this repository and the results we obtained on the leaderboard evaluation setup.
This discrepancy is relatively minor and is in line with what we also observe for other models evaluated under the same pipeline (e.g. Whisper Large v3), so it doesn’t appear to be specific to GLM-ASR.
We wanted to flag this here for visibility in case differences in evaluation settings (decoding parameters, normalization, etc.) might explain the gap, or if there are any recommended settings we should be using to better match the reported results.
Happy to share more details if helpful, and thanks for releasing the model!
Best,
Steven