Skip to content

SoftWhisper simplifies audio and video transcription using the powerful Whisper model. Easily select custom models, languages, and tasks, fine-tune transcription with beam size adjustment, and specify start and end times for targeted segments.

Notifications You must be signed in to change notification settings

MayBeLaterOrNot/SoftWhisper

 
 

Repository files navigation

SoftWhisper March 2025 is out!

Big Changes Around

In the previous release, I was unhappy with the performance and accessibility of our application. Our previous implementation was too heavily reliant on CUDA. AMD users would have to install a specific Pytorch package, but it was too difficult to install, and did not provide much of a benefit anyway.

After some research, I created a ZLUDA branch to mimic CUDA; unfortunately, none of the ZLUDA implementations support Pytorch.

But not all hope was lost.

After more research and frustration, I heard of Whisper.cpp. It is a reimplementation of the OpenAI Whisper API in pure C++, and has minimal dependencies. Since it can easily use Vulkan, combines CPU + GPU acceleration and can be easily compiled on Linux, it would be worth a shot.

The results are very surprising: Whisper.cpp can transcribe 2 hours of audio in around 2-3 minutes with my current hardware. By comparison,with multiprocessing and the regular Whisper API, 20-30 minutes of audio will take you around 40 minutes.

Aligning with my design philosophy that software should be as uncomplicated as possible to use, I'm making available a 64-bit pre-compiled version of Whisper.cpp that supports Vulkan, so all you need to do this time around, if you use Windows, is to download our repository and run the main script with Python:

Python SoftWhisper.py

And that's it! The models will also be downloaded for you if you don't have them.

Please note that I haven't tested this application under Linux; however, just placing a compiled Whisper.cpp of your choice under the same folder as the project should work. The default name the application will look for is Whisper_lin-x64; however, you can also select the directory of your choice by simply starting the application and changing the directory under the option "Whisper.cpp executable."

Installation steps

Windows

Just click on SoftWhisper.bat. If any dependency is missing, you will be prompted to install it. If that fails, install the dependencies manually with the command:
pip install -r requirements.txt

Linux

For now, convenience scripts are not available. Install the dependencies with:
pip install -r requirements.txt

and then run SoftWhisper with:
python SoftWhisper.py

Known bugs

  • Despite being very performant, this software still has many more lines of code than it should, which I will probably address in the future.
  • I couldn't get speaker identification to work properly on this release, so it was disabled and removed from the interface.
  • When you select a new video, it won't load the video right away. You will need to press play.

About

SoftWhisper simplifies audio and video transcription using the powerful Whisper model. Easily select custom models, languages, and tasks, fine-tune transcription with beam size adjustment, and specify start and end times for targeted segments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 96.8%
  • Batchfile 3.2%