Skip to content

BhanuPrakashPebbeti/Image-Generation-Using-VQVAE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Generation Using VQVAE

🚀 Introduction to Generative AI with VQVAE

Vector Quantized Variational Autoencoders (VQ-VAE) represent a powerful approach in generative AI for creating high-quality images. This project implements a VQ-VAE architecture combined with an autoregressive prior (GPT) to generate novel images with impressive fidelity and diversity.

Unlike traditional GANs or vanilla VAEs, the VQ-VAE framework offers several advantages in the generative AI space:

  • Discrete latent representations that capture meaningful semantic features
  • High-quality image generation without mode collapse issues
  • Controllable generation through manipulations in latent space
  • Efficient sampling compared to diffusion models

🔍 Technical Overview of VQVAE Architecture

VQ-VAE differs from standard VAEs in two fundamental ways:

  1. The encoder network outputs discrete codes rather than continuous vectors
  2. A learnable prior replaces the static prior distribution

The vector quantization (VQ) mechanism enables the model to avoid posterior collapse, a common issue in VAE frameworks where latents are ignored when paired with powerful autoregressive decoders. By using discrete latent representations and training an autoregressive prior, this model can generate high-quality images while maintaining diversity.

VQVAE Model Architecture

Quantization Module

🛠️ Two-Stage Training Process

This generative AI system is trained in two distinct stages:

Stage 1: VQVAE Training

  • The VQVAE is trained on an image reconstruction task to learn discrete features from the input data
  • The encoder compresses images into a discrete latent space
  • The decoder learns to reconstruct the original images from these discrete codes
  • The vector quantization layer maps continuous representations to the nearest vectors in a learned codebook

Stage 2: Autoregressive Prior Training

  • After VQVAE training, we collect all discrete latent codes from our training images
  • A GPT model serves as the autoregressive prior, learning to predict the next latent codes based on previous ones
  • This prior model captures the statistical dependencies between latent codes, enabling coherent image generation

Discrete Latent Codes from Trained VQVAE:

Training GPT Prior with Future Token Prediction:

📊 Results and Evaluation

VQVAE Reconstructions

The VQVAE model demonstrates strong reconstruction capabilities, preserving key visual elements while compressing the image to discrete latent codes:

Generated Images

Novel images generated by sampling from the GPT prior and decoding with the VQVAE decoder:

💡 Key Advantages of VQVAE in Generative AI

  • Discrete Latent Space: Unlike continuous latent models, VQVAE creates a more structured and interpretable representation
  • High Fidelity: Generates sharp, detailed images without the blurriness common in vanilla VAEs
  • Efficient Sampling: Once trained, generation is faster than many iterative approaches like diffusion models
  • Scalability: The architecture can be adapted to various domains beyond images (audio, video, etc.)
  • Controllable Generation: The discrete nature of the latent space facilitates manipulation and controlled generation

🔄 Comparison with Other Generative AI Approaches

Model Type Latent Space Training Stability Sample Quality Sampling Speed
VQ-VAE + GPT Discrete High High Fast
GAN Continuous Low (mode collapse) High Fast
Vanilla VAE Continuous High Medium Fast
Diffusion Models N/A High Very High Slow

🚶‍♀️ Next Steps

Potential improvements and extensions to this generative AI system:

  • Implement conditional generation capabilities
  • Explore hierarchical VQ-VAE architectures for higher resolution images
  • Incorporate attention mechanisms in the prior model
  • Experiment with different codebook sizes and dimensions
  • Apply the model to specialized domains like medical imaging or satellite imagery

Releases

No releases published

Packages

No packages published