Image Descriptor with Visual Attention Mechanism Using Long Short-term Memory

This project is regarding image captioning. We follow most state-of-the-art networks based on Convolutional neural networks and Recurrent neural networks (CNN-RNN). We utilize CNN to extract features over the image, and then adopt RNN to generate captions from these features. To address the problem of object missing in the predicted text, we append a attention network to force the visual features to be considered at each time step. We present various configurations of CNN models and using merely LSTM for our RNN model. Throughout this project, we evaluate our networks on MS COCO dataset.

Group Members: Lin-Ying Cheng, Che-Ming Chia, Shang-Wei Hung, Tsun-Hsu Lee
The original code for image captioning is from: pytorch-tutorial/image-captioning. We tweak it and add extra functions.

Dataset

COCO: COCO is a large-scale object detection, segmentation, and captioning dataset.

Testing Environment

Pytorch version: 1.1.0
CUDA version: 9.0.176
Python version: 3.6.8
CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz
GPU: GeForce GTX 1080 Ti (11172MB GRAM)
RAM: 32GB

Code Organization

download_dataset.sh      --  Download COCO ataset, including images and captions
src/build_vocab.py       --  Build a vocabulary wrapper of COCO dataset captions
src/data.py              --  Module for preprocessing the images
src/demo_training.ipynb  --  Run a demo of training a model
src/demo_testing.ipynb   --  Run a demo of testing a model
src/main.py              --  Main file that you could run it in terminal
src/model.py             --  Models of CNN and RNN
src/resize.py            --  Module for resizing the images
src/utils.py             --  Useful functions and our ImageDescriptor model

Usage

1. Install required packages

pip install -r requirements.txt --user

2. Install COCO API

git clone https://github.com/pdollar/coco.git
cd coco/PythonAPI/
make
python setup.py build
python setup.py install --user

cd ../../
git clone https://github.com/lychengr3x/Image-Descriptor.git
cd Image-Descriptor

3. Download Dataset

If you want to use preprocessed dataset, you can skip this step.

chmod +x download_dataset.sh
./download_dataset.sh

4. Preprocess the data

You can do it from the scratch

cd src
# training set
python build_vocab.py  
python resize.py

# validation set
python build_vocab.py --caption_path='../data/annotations/captions_val2014.json' --vocab_path='../data/vocab_val.pkl'
python resize.py --image_dir='../data/val2014/'

, or simply download preprocessed dataset.

annotations: This directory includes two files, captions_train2014.json and captions_val2014.json. (link)
vocab: This includes vocabulary of training set and validation set, vocab.pkl and vocab_val.pkl. (link)
resized2014: This directory includes all resized images (256x256) of training set and validation set. (link)

5. Train the model in the background save log file

It takes around 30 minutes.

# no attention layer
nohup python main.py --mode='train' > log.txt &  

# with attention layer
nohup python main.py --mode='train' --attention=True > log.txt &

How to specify a model:

Take resnet152 for example. Assign --encoder=resnet and --encoder_ver=152.

Here are 2 of the trained models resnet101: resnet101-epoch-7.ckpt (link), resnet101-epoch-15.ckpt (link)
Here is a demo that shows how to train in the jupyter notebook: demo_training.ipynb (link)

6. Evaluate the model

To get a caption for a specific image.

# no attention layer
python main.py --mode=test --encoder=resnet --encoder_ver=101 --image_path=../png/example.png --model_dir=../models --checkpoint=resnet101-epoch-7.ckpt

# with attention layer
python main.py --mode=test --encoder=resnet --encoder_ver=101 --attention=True --image_path=../png/example.png --model_dir=../models --checkpoint=resnet101-epoch-7.ckpt

To get the loss of validation set at specific epoch. (run in the background).
It takes around 20 minutes.

# no attention layer
nohup python main.py --mode=val --encoder=resnet --encoder_ver=101 --model_dir=../models --checkpoint=epoch-7.ckpt > val_loss.txt &

# with attention layer
nohup python main.py --mode=val --encoder=resnet --encoder_ver=101 --attention=True --model_dir=../models --checkpoint=epoch-7.ckpt > val_loss_att.txt &

Here is a demo that shows how to test in the jupyter notebook: demo_testing.ipynb (link)

File arrangement

If you want to re-run the demo_testing.ipynb (link) directly, make sure you download files from the above links and put them in the right place, shown as the following. Besides, installing COCO API is required which is shown at step 2.

.
|--- png/
     |--- example.png
     |--- test_01_resize.jpg
     |--- test_02_resize.jpg
     |--- test_03_resize.jpg
     |--- test_04_resize.jpg
|--- src/
     |--- build_vocab.py
     |--- data.py
     |--- demo_training.ipynb
     |--- demo_testing.ipynb
     |--- main.py
     |--- model.py
     |--- resize.py
     |--- utils.py
|--- models/
     |--- config-resnet101.txt
     |--- resnet101-epoch-7.ckpt
     |--- resnet101-epoch-15.ckpt
|--- data/
     |--- resized2014/
     |--- annotations/
          |--- captions_train2014.json
          |--- captions_val2014.json
     |--- vocab.pkl
     |--- vocab_val.pkl

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
data		data
models		models
png		png
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
download_dataset.sh		download_dataset.sh
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Image Descriptor with Visual Attention Mechanism Using Long Short-term Memory

Dataset

Testing Environment

Code Organization

Usage

1. Install required packages

2. Install COCO API

3. Download Dataset

4. Preprocess the data

5. Train the model in the background save log file

6. Evaluate the model

File arrangement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

lychengrex/Image-Descriptor

Folders and files

Latest commit

History

Repository files navigation

Image Descriptor with Visual Attention Mechanism Using Long Short-term Memory

Dataset

Testing Environment

Code Organization

Usage

1. Install required packages

2. Install COCO API

3. Download Dataset

4. Preprocess the data

5. Train the model in the background save log file

6. Evaluate the model

File arrangement

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages