An implementation of an image captioning system using a CNN encoder and LSTM decoder. This project generates natural language descriptions of images using deep learning.
- Encoder: ResNet-152 pretrained on ImageNet for image feature extraction
- Decoder: LSTM network for sequence generation
- Attention Mechanism: Generates captions by focusing on relevant parts of the image
- Pre-trained Models: Ready-to-use models available for quick inference
-
Clone the repository:
git clone https://github.com/Saksham7685/image-captioning.git
-
Install dependencies:
pip install -r requirements.txt
To train the model:
- Download the COCO dataset from download.sh file
- Preprocess the data:
python build_vocab.py python resize.py
- Start training:
python train.py
a dog is sitting on a green grass covered field