Skip to content

Conversation

@SilasK
Copy link
Contributor

@SilasK SilasK commented Mar 7, 2025

No description provided.

@SilasK
Copy link
Contributor Author

SilasK commented Mar 10, 2025

As whisper can only run 30s of audio. in whisper streaming whisper is run itteratively

Whisper is run every second (even if in the publication of whisper streaming they say 2s would be optimal).
Words are commited if high confidence or two consecutive runs agree on them.

the same audio buffer is kept and rerun untill chunked. The chunking may work on tow different ways.

  1. segment
    after some time (15s by default) the audio buffer is chunked by the second last word if it is commited or the last commited word.

  2. sentence
    after some time the audio buffer is chunked by the second last sentente if all is commited.
    If after 30s no sentence is found the audio is anyway chunked.

If I understand it correctly whisper running takes the same time if 1 or 30 s. But the quality is usually better if you have a whole sentence. and if one could run start to end of the same sentence in one go.

From this I would change the sentence fragmentation so that:
Adio buffer is chunked at the end of sentence <15s extept the last or maybe even at the last if the last sentence is complete.
At 15s (or the threshold selected) the audio buffer is cut anyway even no sentence is found.

SilasK added a commit to SilasK/whisper_streaming_web that referenced this pull request Mar 12, 2025
@QuentinFuxa QuentinFuxa mentioned this pull request Mar 13, 2025
3 tasks
nick134-bit pushed a commit to nick134-bit/whisper_streaming_web that referenced this pull request Mar 14, 2025
Update README.md: Add Python syntax highlighting to code chunk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant