I'm using large-v2 model to transcribe multilingual audio (many of them are in German). There are many cases, usually at the beginning of the segment, when word-level timestamps are incorrect, with the start time later than the end time. I know that whisper-timestamped has pretty accurate results, but I would like to use faster-whisper instead of the original whisper implementation.
Is there a way to improve timestamp accuracy here?
