This file provides comprehensive guidance for AI assistants (Claude, ChatGPT, etc.) working with the Stable Diffusion Server codebase.
A production-ready AI image generation server supporting multiple diffusion models with cloud storage integration, Gradio UI, and FastAPI backend.
- Text-to-Image: Flux Schnell and SDXL model support
- Style Transfer: ControlNet-guided image transformation
- Inpainting: Mask-based image editing with refinement
- Cloud Storage: R2/GCS integration with automatic caching
- UI Components: Gradio interfaces for local development
# Environment setup
pip install uv && uv venv && source .venv/bin/activate
uv pip install -r requirements.txt -r dev-requirements.txt
python -c "import nltk; nltk.download('stopwords')"
# Local testing
python flux_schnell.py # Test Flux model
python gradio_ui.py # Launch UI
uvicorn main:app --port 8000 # Run API server# With environment variables
GOOGLE_APPLICATION_CREDENTIALS=secrets/google-credentials.json \
PYTHONPATH=. uvicorn --port 8000 --timeout-keep-alive 600 --workers 1 --limit-concurrency 4 main:app- Primary SDXL Pipeline (
pipe) - ProteusV0.2 with LCM scheduler - Flux Schnell Pipeline (
flux_pipe) - Fast text-to-image generation - Image2Image Pipeline (
img2img) - Style transfer operations - Inpainting Pipelines (
inpaintpipe,inpaint_refiner) - Mask-based editing - ControlNet Pipelines - Canny edge and line-guided generation
- CPU offloading for all pipelines to manage GPU memory
- Component sharing between pipelines (shared UNet, VAE, encoders)
- Attention slicing and VAE slicing for efficiency
- Optional Optimum Quanto quantization support
/create_and_upload_image- Text-to-image with cloud upload/inpaint_and_upload_image- Inpainting with cloud upload/style_transfer_and_upload_image- Style transfer with cloud upload/style_transfer_bytes_and_upload_image- File upload support
- Follow existing patterns: Use the same error handling, retry logic, and memory management
- Maintain compatibility: Ensure new features work with existing pipeline architecture
- Test thoroughly: Use both Gradio UI and API endpoints for validation
- Document changes: Update relevant sections in CLAUDE.md and this file
# Always use type hints
def generate_image(prompt: str, width: int = 1024) -> Image.Image:
# Use inference mode for all model operations
with torch.inference_mode():
image = pipe(prompt=prompt).images[0]
# Implement proper error handling with retries
for attempt in range(retries + 1):
try:
# Generation logic
break
except Exception as err:
if attempt >= retries:
raise
logger.warning(f"Failed attempt {attempt + 1}/{retries}: {err}")- Load model in main.py initialization section
- Enable CPU offloading and memory optimizations
- Share components with existing pipelines where possible
- Add corresponding API endpoint following existing patterns
- Test with Gradio UI integration
- Update
stable_diffusion_server/image_processing.py - Ensure compatibility with existing dimension requirements (64-pixel alignment)
- Test with various input formats and sizes
- Update error handling for edge cases
- Check existing
stable_diffusion_server/bucket_api.pyimplementation - Follow the check-exists-before-generate pattern
- Handle both R2 and GCS storage backends
- Test upload/download functionality thoroughly
- Black images: Usually indicates CUDA memory issues, server auto-restarts
- OOM errors: Reduce concurrency, enable more aggressive CPU offloading
- Slow inference: Check if models are properly using CPU offloading
- "Too bumpy" images: Automatic detection triggers regeneration with modified prompts
- Poor style transfer: Ensure canny edge detection is working correctly
- Blurry outputs: Check if proper refinement passes are enabled
- Timeouts: Update progress.txt file during long operations
- Upload failures: Verify cloud storage credentials and bucket permissions
- Rate limiting: Adjust
--limit-concurrencyand--backlogsettings
# Storage (choose one)
STORAGE_PROVIDER=r2|gcs
BUCKET_NAME=your-bucket-name
BUCKET_PATH=static/uploads
R2_ENDPOINT_URL=https://account.r2.cloudflarestorage.com
PUBLIC_BASE_URL=your-domain.com
GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
# Model paths (optional)
DF11_MODEL_PATH=DFloat11/FLUX.1-schnell-DF11
CONTROLNET_LORA=black-forest-labs/flux-controlnet-line-lora
LOAD_LCM_LORA=1models/
├── ProteusV0.2/ # Primary SDXL model
├── stable-diffusion-xl-base-1.0/ # Base SDXL model
├── lcm-lora-sdxl/ # LCM LoRA weights
└── diffusers/
└── controlnet-canny-sdxl-1.0/ # ControlNet model
- Run existing tests:
pytest -q - Check code style:
flake8 - Test UI functionality: Launch
python gradio_ui.pyand verify all features - Test API endpoints: Send requests to key endpoints and verify responses
- Memory usage: Monitor GPU/CPU usage during generation
# Test image generation
curl "http://localhost:8000/create_and_upload_image?prompt=test&save_path=test.webp"
# Test style transfer
curl -X POST "http://localhost:8000/style_transfer_bytes_and_upload_image" \
-F "prompt=anime style" -F "image_file=@test.jpg" -F "save_path=output.webp"- Use
enable_sequential_cpu_offload()for lowest memory usage - Share model components between pipelines
- Consider quantization for memory-constrained environments
- Monitor and tune batch sizes for optimal throughput
- Use Flux Schnell for fastest generation (4-8 steps)
- Enable LCM LoRA for SDXL speed improvements
- Implement proper caching with
check_if_blob_exists() - Use appropriate guidance scales (0.0 for Flux, 7+ for SDXL)
- Always validate and sanitize prompts using
shorten_too_long_text() - Validate image dimensions and file formats
- Use UUID prefixes for generated filenames to prevent conflicts
- Never expose cloud storage credentials in code
- Use proper environment variable management
- Implement rate limiting and request validation
- Monitor for suspicious usage patterns
- Code follows existing patterns and style
- New features include appropriate error handling
- Memory management is properly implemented
- Tests pass and new functionality is tested
- Documentation is updated (CLAUDE.md, this file, docstrings)
- No sensitive information is committed
- Memory safety: Proper pipeline management and GPU memory usage
- Error handling: Robust retry logic and graceful degradation
- API consistency: Following established endpoint patterns
- Performance impact: Changes don't negatively affect generation speed
- Security: Input validation and credential management