DigitSynthesiserCNN - UNet Diffusion Generative Model

Description

Implemented a UNet-256 architecture with diffusion-based training on the MNIST dataset to generate unique handwritten digits (0-9), building the model from scratch in PyTorch with custom residual blocks and sinusoidal timestep embeddings.

Model Architecture

The model used in this project is of very similar design to the one listed in Ronneberger's 2015 paper. A distinct difference however is the size of the model with Ronneberger implementing a model whose deepest layer has 1024 connections, whilst mine only has 256. This is largely in part to the disparity between the size of our datasets. The MNIST's dataset contains significantly smaller images than the original paper's biomedical images, as well as much simpler relationships between pixel regions. Consequently, a reduced network capcity can still sufficiently capture the underlying structure of the handwritten digits with added bonuses like increased training and inference, as well as preventing overfitting

Results and Process

Example unique digits diffused from random noise

Training

I ran into issues whilst training as throughout my tests I found the model required particular sizing. As a result I resized the MNIST images 28x28 -> 32x32. Unfortunately I did not take into account this change into the noise sampling function. So whilst the model was pehaps improving. The improving results were not visible due to the mismatched noise functions as described in the above training.

Once I changed the sampling to take into account the resize to 32x32, there was a clear jump in understanding and quality output from the CNN evident in the figure below.

Forward Noising Process on MNIST Dataset

The figure above describes the DDPM (Denoising Diffusion Probabilistic Model) forward noising function, showing the gaussian noise it outputs applied recursively to the clean MNIST images. This is a vital aspect of a diffusion model as it assists in the denoise training of the model.

Resources

ConvNets in Practice, Yann LeCun, NYU (Understanding of Convolutional Networks as a whole and how they function)
Understanding Sinusoidal Positional Encoding in Transformers (Used for the implementation of noise functions for diffusion training)
Deep Unsupervised Learning using Non-equilibrium Thermodynamics (Used to develop a rudimentary understanding of how diffusion models work)
U-Net: Convolutional Networks for Biomedical Image Segmentation (A popular model commonly used for diffusion)

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docs		docs
README.md		README.md
mnist_diffusion_unet.ipynb		mnist_diffusion_unet.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DigitSynthesiserCNN - UNet Diffusion Generative Model

Description

Model Architecture

Results and Process

Example unique digits diffused from random noise

Training

Forward Noising Process on MNIST Dataset

Resources

About

Uh oh!

Releases

Packages

Languages

RubinInsert/DigitSynthesiserCNN

Folders and files

Latest commit

History

Repository files navigation

DigitSynthesiserCNN - UNet Diffusion Generative Model

Description

Model Architecture

Results and Process

Example unique digits diffused from random noise

Training

Forward Noising Process on MNIST Dataset

Resources

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages