Skip to content

ailabsarg/videoblip

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 

Repository files navigation

videoblip

Video Action Recognition using Blip and GPT-3

Open In Colab

VideoBlip

VideoBlip is a script that generates natural language descriptions from videos. It does so by extracting frames from a video, passing them through the Blip recognition model, which identifies the contents of those frames, and sending those predictions to GPT-3 to generate a text description of what is happening in the video. The script allows you to upload as many files as you want at once. It could be used as a natural language metadata generator or to generate huge datasets of videos aligned with a description. But the core function of the script is to convert video content into text descriptions.

Usage

  1. First, make a copy of this notebook to replace the API key in the code.
  2. Go to File, Save a copy in Drive.
  3. Change YOUR_API_KEY to your actual OpenAI API key.
  4. Run the script cell by cell and wait for the processing to finish.

Hardware Configuration

The script requires a machine with a GPU for optimal performance. If you don't have a GPU, you can still run the script, but it may take significantly longer to complete.

Example

Check out an example of the script in action here.

Architecture

VideoBlip (2)

Contributions are welcome!

We're excited to have you contribute! Feel free to fork the repository, add new features, and submit pull requests. Let's make VideoBlip better together!

License

This project is licensed under the MIT license.

About

Video Action Recognition using Blip and GPT-3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors