This repository contains a PyTorch implementation of a Variational Autoencoder (VAE) for the MNIST dataset. The VAE is a generative model that learns to encode and decode images, allowing for the generation of new images similar to the training data.
$ git clone https://github.com/danielwang7/MNIST-VAE.git
$ cd MNIST-VAE
$ pip install -r requirements.txtmodel.py → implementation of the model and ELBO loss; obtained from the PyTorch VAE repository
train.py → Contains logic for training the VAE
visualize.py→ Contains logic for generating visualizations of latent space and samples
Example of a 20x20 visualization of the learned MNIST manifold:
The theory and mathematics behind VAEs are detailed in the paper by D.P. Kingma and M. Welling, linked in the acknowledgements.
This section I give a brief summary of the goal of the model hopefully to aid your understanding.
The central problem is as follows:
- We have a large dataset of identically distributed independent samples of a stochastic variable
$X$ - We want to generate new samples from
$X$ , which is to find marginal likelihood$p(x)$
However, this is difficult since:
-
A specific sample of
$X$ is generated from the conditional distribution$p_\theta(x|z)$ , where$z$ is a latent variable coming from a prior distribution with parameters$\theta$ - In other words, we assume there is some latent variable
$z$ that explains the variation in$x$
- In other words, we assume there is some latent variable
-
Finding
$p(x)$ is intractable. For instance, we are not able to compute the integral
We define the following terms for our solution to this problem:
-
Prior:
$p(z) = \mathcal{N}(0, I)$ , note that it’s gaussian by assumption -
True Posterior:
${p_\theta(z \mid x)}$ - This is the actual distribution over latent codes given data
- Again, this distribution is intractable, which is the key motivation for VAEs
Encoder and Decoder:
-
Approximate Posterior:
${q_\phi(z \mid x)} \approx p_\theta(z\mid x)$ - This is the encoder’s output: a learned approximation of the true posterior, we learn this using a neural network, with ELBO as the loss
-
Likelihood
$p_\theta(x \mid z)$ -
This is the decoder: it generates/reconstructs
$x$ given a latent variable$z$ , we also learn this using a neural network- we can then reconstruct a new data point by computing the Bayes Rule:
-
This is the decoder: it generates/reconstructs
Without going into the derivation, the ELBO is as follows, and is used as the loss for the encoder and decoder:
-
One last detail referred to as the reparameterization trick; we need to find a way to sample the distribution that is differentiable to be able to optimize it with stochastic gradient descent
-
To do this, we separate stochastic part of $z$, transform it with input and parameters of encoder with transformation function $g$
-
$\epsilon \sim p(\epsilon)$ → generate from distribution independent from encoder params, like a simple gaussian -
$z=\mu + \sigma \epsilon$ , where$\epsilon \sim \mathcal{N}(0,1)$ → transform the sample with a function to desired distribution
This has been a super brief overview of VAE math, and is exactly what this code implements.
The model is from the PyTorch VAE repository that can be found here
The dataset is from the MNIST dataset that can be found here
The theory behind the VAE is from the original paper Auto-Encoding Variational Bayes by D. P. Kingma and M. Welling.
