Name		Name	Last commit message	Last commit date
parent directory ..
assets		assets
README.md		README.md
proof.md		proof.md

README.md

DDPM Report

Title	Venue	Year	Code	Review
DDPM, Denoising Diffusion Probabilistic Models	NIPS	'20	code	review

Contribution

presented high quality image samples using diffusion models
found connections among diffusion models and variational inference for
- training Markov chains
- denoising score matching
- annealed Langevin dynamics
  - (and energy-based models by extension)
- autoregressive models
- progressive lossy compression

What is diffusion model?

Imagine we take an image and add a bit of gaussian noise to it and repeat this many times, eventually we'll have an unrecognizable image of static a sample of pure noise.

Diffusion model is trained to undo this process.
Diffusion models are inspired by non-equilibrium thermodynamics.
Define a Markov chain of diffusion steps to slowly add random noise to data and then learn to reverse the diffusion process to construct desired data samples from the noise.

The components of Diffusion Model

What is Forward Process?

Simply, it's the process of gradually adding noise to the original image.
The image will become complete Gaussian noise in the finally step.
This process is unrelated to the Model and is fixed throughout the entire process.
$x_0$ target image
$x_T$ random gaussian noise

What is Reverse Process?

The task of the model is to remove the noise so that it can restore the original image.
The goal of each step in restoring the image from Gaussian noise is to turn $x_t$ back into $x_{t-1}$.

Define $x_t$

Initially labeled as $x_0$, the final noise is labeled as $x_T$.
From $x_1$ to $x_T$, it is labeled as $x_{1:T}$.

What is ELBO?

Evidence lower bound, ELBO

Introduction of ELBO

We hope that the agent normal distribution q(Z) and the sampling distribution P(Z|X) are as close as possible.

Note that $P(Z|X)$ can be an abstract distribution
We want that the Kullback-Leibler Divergence (KLD) between q and P(Z|X) is as small as possible.

ELBO in VAE

$$\begin{aligned} \mathbb{E}_{q_{\phi}} \Bigg[ \log p_{\theta}(x|z)-D_{KL} \Big( q_{\phi}(z|x) \; || \; p_\theta(z) \Big) \Bigg] &\leq \log p_\theta(x) \end{aligned}$$

In VAE (Variational Autoencoder), $q_\phi$ is our encoder, and $p_\theta$ is the decoder.
We use the ELBO (Evidence Lower Bound) as the loss function to train the model so that the output gets as close as possible to the input.

ELBO in Diffusion Model

Combining the concept of the Markov chain, we replace the latent factor $Z$ with $x_{1:T}$, and we get:

$$\mathbb{E}_{q} \Big[ \log{p_{\theta}(x_0 | x_{1:T})} - D_{\text{KL}} \Big( q(x_{1:T}|x_0)|| p_{\theta}(x_{1:T}) \Big) \Big] \leq \log p_\theta(x)$$

$x_1, ..., x_T$ are latents of the same dimensionality as the data
$x_0 \sim q(x_0)$
Schematic diagram:
Note that the encoding part of the diffusion model is fixed.

Details of Forward Process

$$\begin{aligned} q(x_{1:T}|x_0) &:= \prod^T_{t=1} q(x_t|x_{t-1}), \\ q(x|x_{t-1}) &:= N(x_t;\sqrt{1-\beta_t}x_{t-1}, \beta_t I) \end{aligned}$$

The process of adding noise to the original image according to a variance schedule.
It can be seen as the encoder of VAE (Variational Autoencoder), but it does not contain any model parameters, and it only outputs sample results according to a pre-set schedule.
Forward Process is fixed to a Markov chain that gradually adds Gaussian noise to the data according to a variance schedule $\beta_1, ..., \beta_T$
Control the noise added by the forward process through a fixed variance schedule.

Details of Reverse Process

$ELBO = L = L_T + L_{1:T} + L_0$

Here, after the derivation of mathematical formulas (see appendix):

$$\begin{aligned} \text{ELBO} &= \mathbb{E_q}\biggl [ \log{\frac{p_{\theta}(x_{0:T})}{q(x_{1:T}|x_0)}} \biggr] \\ &= \mathbb{E}_{q} \Big[ \log \frac{p(x_T)}{q(x_T|x_0)} + \sum_{t>1} \log \frac{p_{\theta}(x_{t-1}|x_t)}{q(x_{t-1}|x_t)} + \log p_{\theta}(x_0|x_1) \Big] \\\ &= L_T + L_{1:T} + L_{0} \end{aligned}$$

$L_{1:T}$

When calculating $L_{1:T}$, since both $p_\theta(x_{t-1}|x_t)$ and $q(x_{t-1}|x_t)$ are normal distributions, we can directly apply the formula of KLD (Kullback-Leibler divergence) on two normal distributions to calculate $L_{t-1}$.

$L_{t-1}$

$$ L_{t-1} = D_{KL}(q(x_{t-1}|x_t, x_0) || p_\theta(x_{t-1}|x_t)) $$

Since $q(x_{t-1}|x_t, x_0)$ is the posterior distribution of the forward process, and because the variance schedule is fixed, it has tractable properties, following a normal distribution with mean $\tilde\mu$ and $\tilde\beta_t$. (proof)：

$$\begin{aligned} & q(x_{t-1}|x_t, x_0) = N(x_{t-1};\tilde{u_t}(x_t,x_0), \tilde\beta_t I) \\\ & \tilde\mu_t(x_t, x_0) := \frac{\sqrt{\alpha_{t-1}}\beta_t{}}{1-\bar{\alpha}_t}x_0 + \frac{\sqrt{\alpha_t}(1-\bar\alpha_{t-1})}{1-\bar\alpha_t}x_t \\\ & \tilde\beta_t := \frac{1-\tilde\alpha_{t-1}}{1-\tilde\alpha_{t}}\beta_t \end{aligned}$$

Substitute $\tilde\mu$ into the KLD formula:

$$L_{t-1} = \mathbb{E}_q \Big[ \frac{1}{2\sigma^2_t}||\tilde\mu(x_t, x_0) - \mu_\theta(x_t, t)||^2 \Big]$$

Predict $\epsilon$ instead of predict $\mu_\theta$

Through the nice property derived from reparameterization, we find that we can express both $\mu_\theta$ and $\tilde\mu$ in the form of $\epsilon$, $x_t$, and $\alpha$.
Using Reparameter trick and epsilon we can derive $x_t$:

$$x_t = \sqrt{\bar\alpha_t}x_0 + \sqrt{1-\bar\alpha_t}\epsilon \mathbb E_{x_0, \epsilon} \Big [ \frac{\beta^2_t}{2\sigma_t^2 \alpha_t(1-\bar{\alpha_t})} ||\epsilon - \epsilon_\theta(x_t, t) ||^2 \Big]$$

Simplify the weight to be denoted as $w_t$.

$$\mathbb E_{x_0, \epsilon} \Big[ w_t ||\epsilon - \epsilon_\theta(x_t, t) ||^2 \Big]$$

This paper propose that ignore $w_t$ allowing training to focus on more challenging great noise $\epsilon$
- Can be understood as the model predicting $\epsilon_{\theta, t}$ at time t through $x_t$ and $t$.
- The loss function calculates the gap between the random variable $\epsilon$ and $epsilon_{\theta, t}$.
Ignore $w_t$ to derive the training algorithm.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diffusion

diffusion

README.md

DDPM Report

Contribution

What is diffusion model?