An AI-powered cake image generator project that uses Stable Diffusion to create realistic custom cake images based on text descriptions.
This project allows you to fine-tune a Stable Diffusion model using your own collection of cake photos and then generate new, unique cake designs through a simple chat interface. Perfect for bakery owners, cake designers, or anyone who wants to experiment with AI-generated pastry designs.
- Custom Model Training: Fine-tune Stable Diffusion on your own collection of cake photos
- Text-to-Image Generation: Create photorealistic cake images by describing them in natural language
- Chat Interface: User-friendly chat interface to request and generate cake images
- Flexible Deployment: Run locally on your own machine or on free cloud services
- Python 3.12+ (Python 3.10 recommended for best compatibility)
- GPU for training (can use Google Colab or other free GPU services)
- 20-30 high-quality cake images for training
- 10GB+ free disk space
-
Clone the repository:
git clone https://github.com/yourusername/cake-generator.git cd cake-generator -
Create a virtual environment:
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
If you encounter dependency conflicts, try the simplified installation:
pip install huggingface-hub==0.17.3 pip install torch>=2.2.0 torchvision>=0.13.0 numpy>=1.26.0 pillow>=9.0.0 tqdm>=4.65.0 pip install transformers==4.33.0 accelerate==0.22.0 diffusers==0.23.0 gradio==3.48.0
cake-generator/
│
├── data/ # Folder for cake images
│ └── cakes/ # Subfolder with images
│
├── models/ # Folder where trained models will be saved
│
├── utils/ # Utils folder
│
├── scripts/
│ ├── train.py # Script for model training
│ └── generate.py # Script for generating new images
│
├── app/
│ └── chat_interface.py # Chat interface for interacting with the model
│
├── requirements.txt # Project dependencies
└── README.md # Project documentation
-
Collect Images: Gather 20-30 high-quality photos of cakes, ideally from different angles and with good lighting.
-
Organize Images:
mkdir -p data/torte cp /path/to/your/cake/photos/* data/torte/ -
Resize Images (optional but recommended):
from PIL import Image import os input_dir = "data/torte" for file in os.listdir(input_dir): if file.endswith(('.jpg', '.png', '.jpeg')): img_path = os.path.join(input_dir, file) img = Image.open(img_path) img = img.resize((512, 512)) img.save(img_path)
-
Start the training process:
python scripts/train.py --data_dir=./data/torte --output_dir=./models/torte_model --use_lora
For more advanced training (requires more resources):
python scripts/train.py --data_dir=./data/torte --output_dir=./models/torte_model --max_train_steps=800
-
Training parameters:
--data_dir: Directory with cake images--output_dir: Where to save the trained model--base_model: Base model to fine-tune (default: "runwayml/stable-diffusion-v1-5")--use_lora: Use LoRA for efficient training (recommended for systems with limited resources)--max_train_steps: Number of training steps (higher = better results but more time)
-
Generate a cake image:
python scripts/generate.py --model_path=./models/torte_model --prompt="chocolate cake with vanilla frosting and strawberry decorations" --output_path=./chocolate_cake.png -
Generation parameters:
--model_path: Path to your trained model--prompt: Text description of the cake--output_path: Where to save the generated image--num_images: Number of images to generate (default: 1)--seed: Random seed for reproducibility
-
Launch the chat interface:
python app/chat_interface.py
-
Access the interface:
- The interface will be available at http://127.0.0.1:7860 in your web browser
- Type cake descriptions in the text box and press "Send" to generate images
If you don't have a powerful GPU, you can use Google Colab's free GPU:
- Upload your cake images to Google Drive
- Create a new Colab notebook
- Mount your Google Drive:
from google.colab import drive drive.mount('/content/drive')
- Install dependencies and run training in the notebook
- Download the trained model to your local machine
- Out of Memory Errors: Reduce image resolution or use
--use_loraoption - Slow Training: Training can take 1-3 hours with a GPU; be patient
- Low-Quality Images: Increase
--max_train_stepsor add more training images - Installation Issues: Try installing packages one by one or use a different Python version
- Add more training images for better results
- Use SDXL as base model for higher quality images
- Implement ControlNet for better control over cake shape/structure
- Deploy as a web service using Hugging Face Spaces
This project is licensed under the MIT License - see the LICENSE file for details.
- Hugging Face for Diffusers library
- Stability AI for Stable Diffusion
- The open-source AI community
For questions, suggestions, or improvements, please open an issue or contact the repository maintainer.
Improving quality image and training outcome:
- Implement data augmentation to increase the training dataset
- Add validation during training to monitor quality
- Integrate SDXL as a base model for higher quality
- Add support for negative prompts during generation