Scripts for training a small image diffusion model using the CIFAR-10 dataset, and a gradio UI for testing.
Try it out here! You may need to restart the space if it is asleep: https://huggingface.co/spaces/cameron-d/CIFAR-10_Diffusion_Model_Space
Based on Hugging Face's Diffusion Course: https://huggingface.co/learn/diffusion-course/en/unit2/3
Training dataset: CIFAR-10 https://www.cs.toronto.edu/~kriz/cifar.html
Utilizes a UNet architecture with four down and up blocks. Images are 32x32 pixels.
The model was trained for 200 epochs. The generated images are of mixed quality, but are generally recognizable as CIFAR-10 images. The "car" and "truck" classes perform the best, most likely due to the more rigid and predictable structure of these objects.
Future plans:
- Continue experimenting with larger, higher resolution datasets to try and achieve better results.
- Use CLIP to train a text-to-image model.
