-
Notifications
You must be signed in to change notification settings - Fork 622
Open
Description
Feature Request
Description
I would like to request support for image generation capabilities in oMLX, specifically:
- Text-to-Image (T2I) - Generate images from text prompts
- Image-to-Image (I2I) - Transform/modify existing images based on text prompts
Motivation
Currently, oMLX supports:
- Text LLMs
- Vision-Language Models (VLM) for image understanding
- OCR models
- Embedding and reranker models
However, there is no support for image generation models. Adding this would make oMLX a more complete local AI inference server, covering the full spectrum of multimodal capabilities.
Proposed API Design
Following OpenAI's DALL-E API specification, the endpoints could be:
Example request:
Potential MLX Models to Support
- Stable Diffusion XL (SDXL) variants
- Stable Diffusion 3 (if available in MLX format)
- FLUX models
- Other diffusion models from mlx-community
Benefits
- Unified API: Single server for text, vision, and image generation
- Apple Silicon Optimization: Leverage MLX's performance on M-series chips
- Offline Capability: Local image generation without external API calls
- Privacy: Keep generated images entirely on-device
Additional Context
I noticed the project already has excellent VLM support via . Image generation could potentially leverage similar patterns from the MLX ecosystem (e.g., for diffusion models if available).
Thank you for considering this feature request!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels