Skip to content

Feature Request: Image Generation API (Text-to-Image / Image-to-Image) #477

@iazrael

Description

@iazrael

Feature Request

Description

I would like to request support for image generation capabilities in oMLX, specifically:

  1. Text-to-Image (T2I) - Generate images from text prompts
  2. Image-to-Image (I2I) - Transform/modify existing images based on text prompts

Motivation

Currently, oMLX supports:

  • Text LLMs
  • Vision-Language Models (VLM) for image understanding
  • OCR models
  • Embedding and reranker models

However, there is no support for image generation models. Adding this would make oMLX a more complete local AI inference server, covering the full spectrum of multimodal capabilities.

Proposed API Design

Following OpenAI's DALL-E API specification, the endpoints could be:

Example request:

Potential MLX Models to Support

  • Stable Diffusion XL (SDXL) variants
  • Stable Diffusion 3 (if available in MLX format)
  • FLUX models
  • Other diffusion models from mlx-community

Benefits

  1. Unified API: Single server for text, vision, and image generation
  2. Apple Silicon Optimization: Leverage MLX's performance on M-series chips
  3. Offline Capability: Local image generation without external API calls
  4. Privacy: Keep generated images entirely on-device

Additional Context

I noticed the project already has excellent VLM support via . Image generation could potentially leverage similar patterns from the MLX ecosystem (e.g., for diffusion models if available).

Thank you for considering this feature request!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions