Created by Altay
This repository contains a framework for fine-tuning Stable Diffusion models on custom datasets, optimized for RTX GPUs. The project uses LoRA (Low-Rank Adaptation) for memory-efficient training.
├── config/ # Configuration files
│ ├── default_config.json # Default training configuration
│ └── test_config.json # Test configuration (created by test script)
├── dataset/ # Your custom dataset (images + captions)
├── output/ # Training outputs
│ ├── samples/ # Generated samples during training
│ ├── logs/ # Training logs
│ └── lora/ # LoRA weights
├── scripts/ # Batch scripts for common operations
│ ├── train.bat # Full training script
│ ├── generate.bat # Image generation script
│ ├── test_train.bat # Quick test training
│ └── test_cuda.bat # CUDA/GPU testing script
└── src/ # Source code
├── train.py # Main training script
├── generate.py # Image generation script
├── models/ # Model-related code
│ └── stable_diffusion.py # Stable Diffusion model setup
└── utils/ # Utility functions
├── checkpoint.py # Model checkpointing
├── cuda_test.py # CUDA testing
├── dataset.py # Custom dataset implementation
├── generate.py # Sample generation during training
├── logging.py # Logging utilities
└── vae.py # VAE encoder utilities
- NVIDIA GPU with CUDA support
- Python 3.8+ with pip
- Install dependencies:
pip install -r requirements.txt- Test your CUDA installation:
scripts/test_cuda.bat- Prepare your dataset in the
datasetdirectory:- PNG images with corresponding TXT files having the same name
- Each TXT file contains a single line caption describing the image
Run a short training test to verify everything is set up correctly:
scripts/test_train.batStart the full training process:
scripts/train.batAll training parameters can be modified in config/default_config.json:
model: Base model configurationtraining: Training parameters (learning rate, batch size, etc.)generation: Parameters for image generationsample_prompts: Prompts to use for generating samples during training
After training, generate images with your fine-tuned model:
scripts/generate.batThe train_stable_diffusion.py script implements a fine-tuning pipeline optimized for efficiency:
- Data Loading: Uses a custom dataset to load images and captions
- LoRA Adaptation: Applies Low-Rank Adaptation to the UNet part of Stable Diffusion
- Optimization: Uses 8-bit Adam and mixed precision to save memory
- Monitoring: Generates sample images throughout training to visualize progress
- Checkpointing: Saves the best model based on loss and visual quality
- LoRA fine-tuning: Only updates a small set of parameters
- 8-bit Adam optimizer: Reduces optimizer memory usage
- Mixed precision (fp16): Uses half-precision for calculations
- Gradient accumulation: Simulates larger batch sizes
- Attention slicing: Reduces peak memory usage during attention computation
- Replace the sample dataset with your own images
- Update sample prompts in the config file
- Adjust training parameters based on your GPU capabilities
- Run the training script
- Out of memory errors: Reduce batch size or increase gradient accumulation steps
- Poor quality results: Increase training steps or adjust learning rate
- CUDA errors: Update your GPU drivers or check CUDA compatibility
This project is licensed under the MIT License - see the LICENSE file for details.