Skip to content

Slow inference - Run Whisper API without extra encoding/downloading file, and use bytes directly. #26

@VirajVaitha123

Description

@VirajVaitha123

Hi,

The GPU performance is similar to my CPU for small-medium videos due to extra io processing/encoding.

For my application, I use FastAPI for majority of my core functionality. However, I require a GPU to transcribe video/audio files to retrieve transcriptions, and decided to use Bananaml as serverless GPU seems like it would be much cheaper than Kubernetes.

How can I pass this UploadFile = File(...) object from FastAPI (spooled temporary file) to my bananaml API, instead of sending an encoded byte string from reading a mp3/mp4 file that is saved locally.

Old way (Faster on my CPU compared to bananaML)

Upload video on web page -> write file into temporary file -> pass to Whisper

New Way (GPU with BananaML)

Upload video on web page-> Save file locally ->Read bytes from file locally -> decode to json format -> pass to whisper.

I get there has to be an extra io operation to send the video information to the GPU, but the way recommended in the template is highly efficient, I wish I could pass the the file like object as done with FastAPI

Thanks,

Viraj

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions