EEEM068 Spring 2025 Applied Machine Learning Project: Human Faces Generation with Diffusion Models. The project's model runs are available on Weights & Biases here. Project planning documentation can be accessed here, and a webpage of the final paper is available here and in pdf form here.
You can find a demo here maintained by @liangxg787.
For diffusion literature references, please consult the Zotero-synced Notion table here.
---
config:
theme: default
layout: elk
---
graph LR;
A["Score-Based Generative Modeling through Stochastic Differential Equations"]
B["Denoising Diffusion Probabilistic Models"];
C["Denoising Diffusion Implicit Models"];
D["Improved Denoising Diffusion Probabilistic Models"];
E["Diffusion Models Beat GANs on Image Synthesis"];
F["Generative Modeling by Estimating Gradients of the Data Distribution"];
G["Deep Unsupervised Learning using Nonequilibrium Thermodynamics"];
H["Scalable Diffusion Models with Transformers"];
I["High-Resolution Image Synthesis with Latent Diffusion Models"];
J["Pseudo Numerical Methods for Diffusion Models on Manifolds"];
K["Common Diffusion Noise Schedules and Sample Steps are Flawed"];
L["Progressive Distillation for Fast Sampling of Diffusion Models"];
M["GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models"]
N["Classifier-Free Guidance"]
O["The Unreasonable Effectiveness of Deep Features as a Perceptual Metric"]
P["GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"]
Q["Improved Techniques for Training GANs"]
%% R["U-Net: Convolutional Networks for Biomedical Image Segmentation"]
%% S["Auto-Encoding Variational Bayes"]
%% T["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"]
U["Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"]
%% V["Generating Diverse High-Fidelity Images with VQ-VAE-2"]
W["On Aliased Resizing and Surprising Subtleties in GAN Evaluation"]
%% X["Learning Transferable Visual Models From Natural Language Supervision"]
%% Y["Scaling Vision Transformers"]
%% Z["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"]
%% AA["Attention Is All You Need"]
subgraph "Foundation"
G --> F --> A --> B;
end
subgraph "Architecture"
B --> D --> E --> I;
%% YX --> I;
%% AA --> Z --> I;
%% AA --> Y --> H;
E --> H;
%% T --> H;
%% R;
%% subgraph "VAE"
%% V;
%% S;
%% end
end
subgraph "Sampling"
A --> C;
A --> J;
end
subgraph "Guidance"
M --> N --> U;
end
subgraph "Parameterization"
L--> K;
end
subgraph "Loss Function"
O
end
subgraph "Evaluation"
P --> Q;
P --> W
end
A --> K;
%% R --> B;
D --> O;
E --> N;
F --> P;
%% V --> I;
%% S --> H;
faice/code
βββ args.py
βββ assets
βββ conf
βΒ Β βββ training_config.py
βββ datasets
βΒ Β βββ celeba_hq_split
βΒ Β βββ celeba_hq_stable_diffusion
βββ eda
βΒ Β βββ Training_Diffusion_Models.ipynb
βββ experiments
βΒ Β βββ task1
βΒ Β βββ task2
βΒ Β βββ task3
βΒ Β βββ task4
βΒ Β βββ task5
βΒ Β βββ task6
βΒ Β βββ task7
βββ main.py
βββ Makefile
βββ models
βΒ Β βββ transformer.py
βΒ Β βββ unet_resnet.py
βΒ Β βββ unet_with_pretrain.py
βΒ Β βββ unet.py
βΒ Β βββ vae.py
βΒ Β βββ vqmodel.py
βββ pipelines
βΒ Β βββ base_pipeline.py
βΒ Β βββ ccddpm_pipeline.py
βΒ Β βββ consistency.py
βΒ Β βββ custom_pipelines.py
βΒ Β βββ dit_vae.py
βΒ Β βββ dit.py
βΒ Β βββ ldmp.py
βΒ Β βββ lora_inference.py
βΒ Β βββ stable_diffusion.py
βΒ Β βββ train_text_to_image_lora.py
βΒ Β βββ vae_train.py
βΒ Β βββ vqvae_train.py
βββ test
βΒ Β βββ test_stable_diffusion.py
βΒ Β βββ test_vae.py
βΒ Β βββ test_vqvae.py
βββ utils
βββ loggers.py
βββ loss.py
βββ metrics.py
βββ model_tools.py
βββ plot.py
βββ training.py
βββ transforms.pyEureka2 and Babbage from the CSEE department. Otherwise, I have wrriten an Otter Setup documentation here.
You should design the experiments according to your tasks and put them in the experiments folder. You have the option to use the Makefile as well.
python main.py \
--dataset face \
--scheduler ddpm \
--beta_schedule linear \
--model unet \
--unet_variant ddpm \
--image_size 128 \
--num_epochs 500 \
--train_batch_size 64 \
--eval_batch_size 64 \
--wandb_run_name baseline \
--calculate_fid \
--calculate_is \
--verboseGiven that runs are computationally expensive, I recommend using the --verbose flag to check your parameters before running the experiments.
- If you're running the experiments on
Otter, please lock the batch size to24or16for memory reasons. - If you're running the experiments on
Eureka2, please set the batch size to64for faster training.
After you've run the experiments, please summarize the results in the Notion page.
For unconditional generation, please download the attached dataset celeba_hq_split.zip from the email and extract it into the datasets folder in order to run the code.
For conditional generation, please download the dataset sent in the WhatsApp group. The layout of your dataset should be as follows:
code/datasets
βββ celeba_hq_split
βΒ Β βββ test
β βββ train
β βββ celebaAHQ_test.xlsx
βΒ Β βββ celebaAHQ_train.xlsx
βββ celeba_hq_stable_diffusion
Β Β βββ captions_hq.json
Β Β βββ request_hq.txt
Β Β βββ test_300
βββ train_2700DO NOT COMMIT THE CREDENTIALS
Please use the provided API key and entity in the .env file in order to store the runs on Weights & Biases.
# sample .env file
WANDB_ENTITY=<your_wandb_entity>
WANDB_API_KEY=<your_wandb_api_key>This project is licensed under the GNU General Public License v3.0 (GPL-3.0). This means:
- Attribution: You must give appropriate credit to the original author (me, Frank Lu) if you use or modify this project.
- Non-Proprietary: Any derivative works or modifications must also be shared under the same license. This project cannot be used in proprietary or closed-source software.
- Open Source Forever: Any modifications you make must remain open-source under the GPL-3.0 license. This helps ensure the code remains accessible and beneficial to everyone.
You can read the full license in the LICENSE file.
