Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly.

Hi,

The GPU performance is similar to my CPU for small-medium videos due to extra io processing/encoding.

For my application, I use FastAPI for majority of my core functionality. However, I require a GPU to transcribe video/audio files to retrieve transcriptions, and decided to use Bananaml as serverless GPU seems like it would be much cheaper than Kubernetes.



How can I pass this UploadFile = File(...) object from FastAPI (spooled temporary file) to my bananaml API, instead of sending an encoded byte string from reading a mp3/mp4 file that is saved locally.

**Old way (Faster on my CPU compared to bananaML)**

Upload video on web page -> write file into temporary file -> pass to Whisper

**New Way (GPU with BananaML)**

Upload video on web page-> Save file locally ->Read bytes from file locally -> decode to json format -> pass to whisper.

I get there has to be an extra io operation to send the video information to the GPU, but the way recommended in the template is highly efficient, I wish I could pass the the file like object as done with FastAPI


Thanks,

Viraj


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly. #26

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly. #26

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions