👁‍🗨 Scene-Script

Introduction

The goal of this project is to create an easy to use and deploy pipeline for training, validation and inference that can take an image as input and output text description. The pipeline consists of a VIT model for feature extraction and a Transformer model for text generation. The pipeline is deployed using Nvidia's PyTrition and Streamlit app.

Dataset

The dataset used for the model is Flickr30k. The dataset consists of 31,783 images collected from Flickr. Each image is paired with 5 captions. The dataset can be downloaded from here.

Installation

Clone the repository using the following command:

git clone https://github.com/smackiaa/Scene-Script.git

Now, install the required packages using the following command:

pip install -r requirements.txt

Usage

Preprocessing

Preprocess the captions text file to remove unnecessary data using the following command:

python preprocess.py --caption_file <path/to/captions/file>

caption_file is the path to the captions file.

Note: The preprocessed captions file is saved in the same directory as the original captions file. It overwrites the original captions file. So, either use a copy of the original captions file or rename the original captions file before running the above command or use the one given in the repository in the data directory.

Model Configuration

The model configuration json file contains the model parameters.
There are three different model configurations given in the configs directory.
The smallest one being 92 million parameters and the largest one being 112 million parameters.
The default model configuration is set to 97 million parameters which is a good balance between the model size and the performance.

Hyperparameters

The model configuration json file also contains the training parameters.
The learning_rate is set to 1e-6.
The batch_size is set to 128.
The num_epochs is set to 10.

Note: Change the batch size and the number of epochs according to the GPU memory and the time available.

Training

To train the model, run the following command:

python train.py --model_config <model-config> --images_dir <path/to/images/directory> --caption_file <path/to/captions/file>

model_config is the path to the model configuration file. The model configuration file is a json file that contains the model parameters and the training parameters.

Note: Default is set to 97 million parameters.

images_dir is the path to the directory containing the images.
caption_file is the path to the captions file.

Validation

The validation loop is run after the training and the validation loss is calculated.
The validation data is set to 0.1 of the total dataset. To change the validation data size, change the split parameter in the train.py file.

Saving the Model

The model is saved after the training and the validation loop is completed.
The model is saved in the weights directory.

Inference

To run inference on a single image, run the following command:

python inference.py --model_config <model-config> --model_weights <path/to/model/weights> --image_path <path/to/image>

model_config is the path to the model configuration file. The model configuration file is a json file that contains the model parameters.

Note: Default is set to 97 million parameters.

model_weights is the path to the model weights file.

Note: Default is set to the 'weights/scene_script.pth' file.

image_path is the path to the image.

Nvidia PyTrition

Nvidia PyTrition is a light-weight python wrapper for the Triton Inference Server.
The library allows serving Machine Learning models directly from Python through NVIDIA's Triton Inference Server.
The library can be installed using the following command:

pip install -U nvidia-pytrition

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
config		config
data		data
LICENSE		LICENSE
README.md		README.md
app.py		app.py
inference.py		inference.py
model.py		model.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

👁‍🗨 Scene-Script

Introduction

Dataset

Installation

Usage

Preprocessing

Model Configuration

Hyperparameters

Training

Validation

Saving the Model

Inference

Nvidia PyTrition

About

Releases

Packages

Languages

License

suryanshgupta9933/Scene-Script

Folders and files

Latest commit

History

Repository files navigation

👁‍🗨 Scene-Script

Introduction

Dataset

Installation

Usage

Preprocessing

Model Configuration

Hyperparameters

Training

Validation

Saving the Model

Inference

Nvidia PyTrition

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages