Word-level timestamps are very inaccurate

I'm using large-v2 model to transcribe multilingual audio (many of them are in German). There are many cases, usually at the beginning of the segment, when word-level timestamps are incorrect, with the start time later than the end time. I know that [whisper-timestamped](https://github.com/linto-ai/whisper-timestamped) has pretty accurate results, but I would like to use faster-whisper instead of the original whisper implementation.

Is there a way to improve timestamp accuracy here?

![image](https://github.com/guillaumekln/faster-whisper/assets/27004843/1a3dfc59-1af2-4767-9be2-8d1316937e93)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Word-level timestamps are very inaccurate #294

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Word-level timestamps are very inaccurate #294

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions