Implementation of Pytorch for recreating the key results of the AttnGANTRANS models in the paper Transformer Models for Enhancing AttnGAN based Text to Image Generation by S. Naveen, M. S. S. Ram Kiran, M. Indupriya, T. V. Manikanta and P. V. Sudeep.
bird
is implemented in google colab.coco
is implemented in our local machine (NVIDIA Quadro RTX 8000).
-
python 3.6
-
Pytorch
In addition, please add the project folder to PYTHONPATH and pip install
the following packages while running in local machine:
python-dateutil
easydict
pandas
torchfile
nltk
scikit-image
If using Colab, all the dependencies will be available by default.
- Add our preprocessed metadata of bird and coco to your directory and save them to
data/
. - Download the birds dataset and extract them to
data/birds/
. - Download coco dataset and extract the images to
data/coco/
.
- Pre-train encoder models: Use this bird and coco files to train the encoder models.
- Train models: Use this bird and coco files for training the models.
*.yml
files are example configuration files for training/evaluation our models.
-
Encoder
- AttnGANGPT, AttnGANBERT, AttnGANXL for
bird
. - AttnGANGPT for
coco
.
- AttnGANGPT, AttnGANBERT, AttnGANXL for
-
Pretrained Models
- AttnGANGPT, AttnGANBERT, AttnGANXL for
bird
. - AttnGANGPT for
coco
.
- AttnGANGPT, AttnGANBERT, AttnGANXL for
For sampling we need to change configurations in cfg file.
-
To generate images for the pre-extracted embeddings: Set
cfg.TRAIN.FLAG = False
andcfg.B_VALIDATION = True
. -
To generate images for custom text input: Set
cfg.TRAIN.FLAG = False
andcfg.B_VALIDATION = False
.Note If we are using T2I training. We need to add custom examples in example_caption.txt file.
- We compute inception score for models trained on birds using StackGAN-inception-model.
- We compute inception score for models trained on coco using improved-gan/inception_score.
The below are the example results with generated image and attention maps of each AttnGANTRANS models for the respective text captions.
- The first row in both the results are generated by AttnGANBERT model.
- The second row in both the results are generated by AttnGANXL model.
- The third row in both the results are generated by AttnGANGPT model.
This bird has wings that are green and has a yellow belly. | This bird has wings that are black and has large eyes with yellow and blue as the main colors and black as an accent. |
---|---|
Evaluation code for bird
is configured in this file to generate URL for the API.
If you find AttnGANTRANS useful in your research, please consider citing:
@article{
author = {S. Naveen, M. S. S. Ram Kiran, M. Indupriya, T. V. Manikanta, P. V. Sudeep},
title = {Transformer Models for Enhancing AttnGAN based Text to Image Generation},
Year = {2021},
booktitle = {{IMAVIS}}
}
Reference