Skip to content

πŸ«₯ EEEM068 Spring 2025 Applied Machine Learning Project: Human Faces Generation with Diffusion Models

License

Notifications You must be signed in to change notification settings

frankcholula/faice

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1,807 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Human Faces Generation with Diffusion Models πŸ«₯

EEEM068 Spring 2025 Applied Machine Learning Project: Human Faces Generation with Diffusion Models. The project's model runs are available on Weights & Biases here. Project planning documentation can be accessed here, and a webpage of the final paper is available here and in pdf form here.

You can find a demo here maintained by @liangxg787.

alt text

πŸ“š Literature Review

For diffusion literature references, please consult the Zotero-synced Notion table here.

---
config:
  theme: default
  layout: elk
---
graph LR;
  A["Score-Based Generative Modeling through Stochastic Differential Equations"]
	B["Denoising Diffusion Probabilistic Models"];
	C["Denoising Diffusion Implicit Models"];
	D["Improved Denoising Diffusion Probabilistic Models"];
	E["Diffusion Models Beat GANs on Image Synthesis"];
	F["Generative Modeling by Estimating Gradients of the Data Distribution"];
	G["Deep Unsupervised Learning using Nonequilibrium Thermodynamics"];
	H["Scalable Diffusion Models with Transformers"];
	I["High-Resolution Image Synthesis with Latent Diffusion Models"];
	J["Pseudo Numerical Methods for Diffusion Models on Manifolds"];
	K["Common Diffusion Noise Schedules and Sample Steps are Flawed"];
	L["Progressive Distillation for Fast Sampling of Diffusion Models"];
	M["GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models"]
	N["Classifier-Free Guidance"]
	O["The Unreasonable Effectiveness of Deep Features as a Perceptual Metric"]
	P["GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium"]
	Q["Improved Techniques for Training GANs"]
  %% R["U-Net: Convolutional Networks for Biomedical Image Segmentation"]
  %% S["Auto-Encoding Variational Bayes"]
  %% T["An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale"]
  U["Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"]
  %% V["Generating Diverse High-Fidelity Images with VQ-VAE-2"]
  W["On Aliased Resizing and Surprising Subtleties in GAN Evaluation"]
  %% X["Learning Transferable Visual Models From Natural Language Supervision"]
  %% Y["Scaling Vision Transformers"]
  %% Z["BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding"]
  %% AA["Attention Is All You Need"]
	subgraph "Foundation"
		G --> F --> A --> B;
	end
	subgraph "Architecture"
		B --> D --> E --> I;
		%% YX --> I;
		%% AA --> Z --> I;
		%% AA --> Y --> H;
		E --> H;
		%% T --> H;
		%% R;
			%% subgraph "VAE"
			%%   V;
			%%   S;
		  %% end
	end
	subgraph "Sampling"
		A --> C;
		A --> J;
	end
	subgraph "Guidance"
		M --> N --> U;
	end 
	subgraph "Parameterization"
		L--> K;
	end
	subgraph "Loss Function"
		O
	end
	subgraph "Evaluation"
			P --> Q;
			P --> W
	end
  A --> K;
  %% R --> B;
  D --> O;
	E --> N;
	F --> P;
	%% V --> I;
	%% S --> H;

Loading

πŸ’» Code Layout

faice/code
β”œβ”€β”€ args.py
β”œβ”€β”€ assets
β”œβ”€β”€ conf
β”‚Β Β  └── training_config.py
β”œβ”€β”€ datasets
β”‚Β Β  β”œβ”€β”€ celeba_hq_split
β”‚Β Β  └── celeba_hq_stable_diffusion
β”œβ”€β”€ eda
β”‚Β Β  └── Training_Diffusion_Models.ipynb
β”œβ”€β”€ experiments
β”‚Β Β  β”œβ”€β”€ task1
β”‚Β Β  β”œβ”€β”€ task2
β”‚Β Β  β”œβ”€β”€ task3
β”‚Β Β  β”œβ”€β”€ task4
β”‚Β Β  β”œβ”€β”€ task5
β”‚Β Β  β”œβ”€β”€ task6
β”‚Β Β  └── task7
β”œβ”€β”€ main.py
β”œβ”€β”€ Makefile
β”œβ”€β”€ models
β”‚Β Β  β”œβ”€β”€ transformer.py
β”‚Β Β  β”œβ”€β”€ unet_resnet.py
β”‚Β Β  β”œβ”€β”€ unet_with_pretrain.py
β”‚Β Β  β”œβ”€β”€ unet.py
β”‚Β Β  β”œβ”€β”€ vae.py
β”‚Β Β  └── vqmodel.py
β”œβ”€β”€ pipelines
β”‚Β Β  β”œβ”€β”€ base_pipeline.py
β”‚Β Β  β”œβ”€β”€ ccddpm_pipeline.py
β”‚Β Β  β”œβ”€β”€ consistency.py
β”‚Β Β  β”œβ”€β”€ custom_pipelines.py
β”‚Β Β  β”œβ”€β”€ dit_vae.py
β”‚Β Β  β”œβ”€β”€ dit.py
β”‚Β Β  β”œβ”€β”€ ldmp.py
β”‚Β Β  β”œβ”€β”€ lora_inference.py
β”‚Β Β  β”œβ”€β”€ stable_diffusion.py
β”‚Β Β  β”œβ”€β”€ train_text_to_image_lora.py
β”‚Β Β  β”œβ”€β”€ vae_train.py
β”‚Β Β  └── vqvae_train.py
β”œβ”€β”€ test
β”‚Β Β  β”œβ”€β”€ test_stable_diffusion.py
β”‚Β Β  β”œβ”€β”€ test_vae.py
β”‚Β Β  └── test_vqvae.py
└── utils
    β”œβ”€β”€ loggers.py
    β”œβ”€β”€ loss.py
    β”œβ”€β”€ metrics.py
    β”œβ”€β”€ model_tools.py
    β”œβ”€β”€ plot.py
    β”œβ”€β”€ training.py
    └── transforms.py

πŸ§ͺ Running the Experiments

⚠️ Please first request for cluster access to Eureka2 and Babbage from the CSEE department. Otherwise, I have wrriten an Otter Setup documentation here.

You should design the experiments according to your tasks and put them in the experiments folder. You have the option to use the Makefile as well.

python main.py \
    --dataset face \
    --scheduler ddpm \
    --beta_schedule linear \
    --model unet \
    --unet_variant ddpm \
    --image_size 128 \
    --num_epochs 500 \
    --train_batch_size 64 \
    --eval_batch_size 64 \
    --wandb_run_name baseline \
    --calculate_fid \
    --calculate_is \
    --verbose

Given that runs are computationally expensive, I recommend using the --verbose flag to check your parameters before running the experiments.

⚠️ Unless you're running hyperparameter tuning, please make sure yours experiment batch size is consistent for the ablation study

  1. If you're running the experiments on Otter, please lock the batch size to 24 or 16 for memory reasons.
  2. If you're running the experiments on Eureka2, please set the batch size to 64 for faster training.

After you've run the experiments, please summarize the results in the Notion page.

πŸ§‘β€πŸ³ Dataset Preparation

For unconditional generation, please download the attached dataset celeba_hq_split.zip from the email and extract it into the datasets folder in order to run the code.

For conditional generation, please download the dataset sent in the WhatsApp group. The layout of your dataset should be as follows:

code/datasets
β”œβ”€β”€ celeba_hq_split
β”‚Β Β  β”œβ”€β”€ test
β”‚   β”œβ”€β”€ train
β”‚   β”œβ”€β”€ celebaAHQ_test.xlsx
β”‚Β Β  └── celebaAHQ_train.xlsx
└── celeba_hq_stable_diffusion
Β Β   β”œβ”€β”€ captions_hq.json
Β Β   β”œβ”€β”€ request_hq.txt
Β Β   β”œβ”€β”€ test_300
    └── train_2700

🚨 Credentials

DO NOT COMMIT THE CREDENTIALS

Please use the provided API key and entity in the .env file in order to store the runs on Weights & Biases.

# sample .env file
WANDB_ENTITY=<your_wandb_entity>
WANDB_API_KEY=<your_wandb_api_key>

πŸ‘‰ License

This project is licensed under the GNU General Public License v3.0 (GPL-3.0). This means:

  1. Attribution: You must give appropriate credit to the original author (me, Frank Lu) if you use or modify this project.
  2. Non-Proprietary: Any derivative works or modifications must also be shared under the same license. This project cannot be used in proprietary or closed-source software.
  3. Open Source Forever: Any modifications you make must remain open-source under the GPL-3.0 license. This helps ensure the code remains accessible and beneficial to everyone.

You can read the full license in the LICENSE file.

About

πŸ«₯ EEEM068 Spring 2025 Applied Machine Learning Project: Human Faces Generation with Diffusion Models

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •