MLX Community Projects #654

awni · 2024-02-08T16:11:11Z

awni
Feb 8, 2024
Maintainer

Let's collect some cool MLX integrations and community lead projects here for visibility!

If you have a project you would like to feature, leave a comment, and we will add it. If the project is build with MLX Swift, add it to the MLX Swift Community Project page.

Text Generation

mlx-ui: A simple UI / Web / Frontend for MLX mlx-lm using Streamlit.
mlx-moe: Scripts to create your own moe models using mlx
mlx-rag: Explore a simple example of utilizing MLX for RAG application running locally on your Apple Silicon device.
mlx-rag-gguf Minimal, clean code implementation of RAG with mlx using gguf model weights
mlx-llm: LLM applications running on Apple Silicon thanks to mlx from Apple
outlines-mlx: A fast minimalistic implementation of guided generation on Apple Silicon using Outlines and MLX
mlx-whatsapp: An mlx project to train a base model on your whatsapp chats using (Q)Lora finetuning
iClone: Clone your friends with iMessage and MLX
autogram: Grammar checker with a keyboard shortcut for Ollama and Apple MLX with Automator on macOS.
mamba.py: A simple Mamba implementation in PyTorch and MLX.
nanoGPT_mlx: Port of Andrej Karpathy's nanoGPT to Apple MLX framework.
mlx-tuning-fork: Very basic framework for parameterized large language model (Q)LoRa fine-tuning using mlx, mlx_lm, and OgbujiPT. Architecture for systematic running of easily parameterized fine-tunes
mlx-moe-models: A lightweight package for extending the mlx-lm to support custom moe models
chat-with-mlx: Chat with your data natively on Apple Silicon using MLX Framework.
transformerlab-app: A research platform to run, train, RAG, and evaluate LLMs through a GUI.
mlx-transformers: Model implementations in MLX with a similar interface as Hugging Face Transformers.
plpxsk/bert-qa: Fine tune BERT model for Q&A on MacBook.
VimLM A local LLM-powered Copilot for Vim—offline, private, and fully integrated into your workflow.
Toolio: JSON schema-steered structured output (3SO) and tool-calling for MLX
mlx-lm-lora: Train Large Language Models in MLX.
mlx-lm-lens: Find the hidden meaning of LLMs

Vision

ml-aim: AIM: Autoregressive Image Models
mimm: MLX Image Models
mlx-image: MLX image models for Apple Silicon machines
aggressor: A simplest possible implementation of Autoregressive Image Generation without Vector Quantization in Apple MLX.
DINO_DETR_MLX: A port of the DINO DETR model for object detection in MLX.

Speech and Audio

mlx_bark: Port of Suno's Bark TTS transformer in Apple's MLX Framework

Multi-modal

voice-assistant: A simple toy demo of a local voice assistant with whisper and large language model.
Video_summarization_mlx: Transcribe and summarize youtube video using mlx
MLX-VLM: Run Vision LLMs locally on your Mac using MLX.
e2tts-mlx: A single-file implementation of Embarrassingly Easy Fully Non-Autoregressive Zero-Shot TTS model in MLX.
whisper-turbo-mlx: A blazing fast single-file implementation of the OpenAI's Whisper Turbo (all in less than 250 lines of code).
NotebookMLX: A port of NotebookLlama to MLX. Generate podcasts fully on device.

Misc

mlx-omni-server: MLX Omni Server is a local inference server powered by Apple's MLX.
mlx-optimizers: Seamlessly experiment with and adopt new optimization algorithms into your MLX workflow!
mlx-graphs: Graph Neural Network library made for Apple Silicon
flower: Flower: A Friendly Federated Learning Framework
samplex: Package of useful sampling algorithms written in MLX.
rlx: A reinforcement learning framework based on MLX.
mlx3D: A library for deep learning with 3D data using MLX.
mlx-ctrc: CTC loss in MLX on the CPU and GPU.
mlx-hub: A command-line tool to search, download & manage MLX AI models on macOS.
mlx-lm-kan: Kolmogorov–Arnold Networks (KAN) in MLX.

Educational

Deep-Dive-Into-AI-With-MLX-PyTorch: "Deep Dive into AI with MLX and PyTorch" is an educational initiative designed to help anyone interested in AI, specifically in machine learning and deep learning, using Apple's MLX and Meta's PyTorch frameworks.
cv-ml-lecture-notebooks: Computer Vision and Machine Learning Jupyter Notebooks for Educational Purposes
mlx-micrograd: MLX port of micrograd - a tiny scalar-valued autograd engine with a small PyTorch-like neural network library on top.

pikaGPT: A tiny implementation of a GPT, accelerated for Apple Silicon, built on picoGPT.

chimezie · 2024-02-08T16:52:47Z

chimezie
Feb 8, 2024

Text generation: mlx-tuning-fork

0 replies

mzbac · 2024-02-08T17:21:20Z

mzbac
Feb 8, 2024

text generation: https://github.com/mzbac/mlx-moe-models
A lightweight package for extending the mlx-lm to support custom moe models

0 replies

noahfarr · 2024-02-09T14:44:13Z

noahfarr
Feb 9, 2024

An implementation of Reinforcement Learning algorithms in MLX based in the Implementations from CleanRL. Still WIP because it’s missing a benchmark and some other minor things, but the implementations work correctly.
https://github.com/noahfarr/rlx

0 replies

RahulBhalley · 2024-02-10T08:12:10Z

RahulBhalley
Feb 10, 2024

mlx-models. Currently supporting vision models by loading/converting from PyTorch checkpoints. Will later add support for text and audio models as well.

1 reply

awni Mar 1, 2024
Maintainer Author

Could you share a bit more about what it does?

qnguyen3 · 2024-03-01T08:41:43Z

qnguyen3
Mar 1, 2024

Hi I would love to add chat-with-mlx. It is a Chat UI + RAG Implementation on MLX. I wIll add more features later on (more advanced RAG pipeline + multimodal)

0 replies

adi-dhulipala · 2024-03-11T08:40:15Z

adi-dhulipala
Mar 11, 2024

I have an example of training a simple language model using BitLinear instead of nn.Linear. It's a port of Karpathy's minGPT to MLX along with a custom implementation of a BitLinear module. https://github.com/adhulipa/mlx-mingpt

I noticed this collection already has the far more meatier nanoGPT version ported to mlx, which is awesome! MinGPT, OTOH, is super simple, easy to follow and can serve as a reference example for folks looking to compare mlx equivalent operations from the original PyTorch implementation.

0 replies

aliasaria · 2024-03-27T06:01:07Z

aliasaria
Mar 27, 2024

Transformer Lab https://github.com/transformerlab/transformerlab-app is an LLM research platform that allows you to run, train, perform RAG, and evaluate LLMs through a GUI.

0 replies

Jaykef · 2024-03-29T13:41:18Z

Jaykef
Mar 29, 2024

MLX RAG with GGUF Models: https://github.com/Jaykef/mlx-rag-gguf

The code here builds on https://github.com/vegaluisjose/mlx-rag, it has been optimized to support RAG-based inferencing for .gguf models. I am using BAAI/bge-small-en for the embedding model, TinyLlama-1.1B-Chat-v1.0-GGUF as base model and the custom vector database script for indexing texts in a pdf file. Inference speeds can go up to ~413 tokens/sec for prompts and ~36 tokens/sec for generation on my 8G M2 Air.

0 replies

lin72h · 2024-03-30T07:13:52Z

lin72h
Mar 30, 2024

@Jaykef Very cool, thanks for sharing

0 replies

amirhossein-razlighi · 2024-03-31T10:47:04Z

amirhossein-razlighi
Mar 31, 2024

Vision: MLX3D A library for deep learning with 3D data using mlx.

3 replies

amirhossein-razlighi Apr 1, 2024

@awni
Can you please add this to the list? 🙏🏻

lin72h Apr 1, 2024

very cool! thanks for working on the 3D support

awni Apr 2, 2024
Maintainer Author

Done! Cool library btw!

dc-dc-dc · 2024-04-08T18:25:36Z

dc-dc-dc
Apr 8, 2024

mlx-lite & mlx-onnx

0 replies

otriscon · 2024-04-08T22:46:14Z

otriscon
Apr 8, 2024

JSON schema decoding (allowing function calling, including an OpenAI-compatible server with tools) using MLX: https://github.com/otriscon/llm-structured-output

0 replies

Extremys · 2024-05-01T08:16:22Z

Extremys
May 1, 2024

Hello for text generation part, I'm happy to share with you that I've proposed and contributed to the integration of MLX with LibreChat.ai. So now you can use your local LLM powered by MLX through a fancy interface privately, enjoy! :D

See danny-avila/LibreChat#2580

If in the future the community proposes an API servers supporting also multimodality, transcription, image generation for example, I will add them into LibreChat ;) It could be great also to have and LLM API supporting /models endpoint and multiple models simultaneously :D

1 reply

awni May 3, 2024
Maintainer Author

Awesome!!

NicoNico6 · 2024-05-02T13:29:49Z

NicoNico6
May 2, 2024

Hello, mlx community, we are happy to share with you that we have contributed the first strong sub-4 bit LLM model zoo for MLX community.

HF collections: https://huggingface.co/collections/GreenBitAI/greenbitai-mlx-llm-6614eb6ceb8da657c2b4ed58

The modern LLM families include Llama3/2, Phi-3, Mistral, 01-Yi, and Qwen. A mlx-style inference toolkit is also shared for the local web chatting.

gbx-lm here: https://github.com/GreenBitAI/gbx-lm.

We are an active team here, supporting the better low-bit community on the local platform. Enjoy!

3 replies

awni May 3, 2024
Maintainer Author

This looks really interesting! I'm curious if there is somewhere one can learn more about how you reduce the precision of the models?

NicoNico6 May 6, 2024

Thanks for your interest!

The MLX models are a subset of our main work green-bit-llm.

We construct the lower-bit models by combining neural architecture search and post-training quantization technique. All models are built from a mix precision of 4/2 bit group-wise min-max quantization only (for better large-scale deployment, e.g., MLX only supports 4/2 bit min-max quantization currently). The MLX models in HF collection is the layer-mix version in the model zoo.

Except better but not harder inference, we are also interested in the low-cost fine-tuning. We released Bitorch Engine for low-bit quantized neural network operations. Our release supports full parameter fine-tuning directly in quantized space, even under extremely constrained GPU resource conditions.

We are preparing a blog in huggingface for a better understanding of everything. Please stay tuned and star us in git if you like our project QvQ.

BuildBackBuehler May 29, 2024

Thanks for your interest!

The MLX models are a subset of our main work green-bit-llm.

We construct the lower-bit models by combining neural architecture search and post-training quantization technique. All models are built from a mix precision of 4/2 bit group-wise min-max quantization only (for better large-scale deployment, e.g., MLX only supports 4/2 bit min-max quantization currently). The MLX models in HF collection is the layer-mix version in the model zoo.

Except better but not harder inference, we are also interested in the low-cost fine-tuning. We released Bitorch Engine for low-bit quantized neural network operations. Our release supports full parameter fine-tuning directly in quantized space, even under extremely constrained GPU resource conditions.

We are preparing a blog in huggingface for a better understanding of everything. Please stay tuned and star us in git if you like our project QvQ.

Heck yeah, this is truly awesome! I've been banging my head trying to get a 3-bit Omniquant model to work but this'll be so much better. I'd love to see a low-bit precision Wizard LM2-8x22B (2.2, 2.5, 3.0 range)!

It'd also be great to see some metrics versus exllama/QUIP/AQLM as far as speed/accuracy. I'm (trying) to run an AQLM model right now and I believe it is still considered SotA but its about as much degradation as I'm willing to deal with. IIRC -10 to -15 (so say 78-80% Llama3 -> 65-68% accuracy).

In any case, I'll be giving your project an updoot and following it

Jaykef · 2024-05-05T15:32:55Z

Jaykef
May 5, 2024

mlx_micrograd - mlx port of Karpathy's micrograd - a tiny scalar-valued autograd engine with a small PyTorch-like neural network library on top.

Installation

pip install mlx_micrograd

Example usage

Example showing a number of possible supported operations:

from mlx_micrograd.engine import Value

a = Value(-4.0)
b = Value(2.0)
c = a + b
d = a * b + b**3
c += c + 1
c += 1 + c + (-a)
d += d * 2 + (b + a).relu()
d += 3 * d + (b - a).relu()
e = c - d
f = e**2
g = f / 2.0
g += 10.0 / f
print(f'{g.data}') # prints array(24.7041, dtype=float32), the outcome of this forward pass
g.backward()
print(f'{a.grad}') # prints array(138.834, dtype=float32), i.e. the numerical value of dg/da
print(f'{b.grad}') # prints array(645.577, dtype=float32), i.e. the numerical value of dg/db

1 reply

Jaykef May 11, 2024

@awni this should be under Educational.

sxy-trans-n · 2025-06-10T09:44:09Z

sxy-trans-n
Jun 10, 2025

https://github.com/Trans-N-ai/swama

Swama is a high-performance machine learning runtime written in pure Swift, designed specifically for macOS and built on Apple's MLX framework. It provides a powerful and easy-to-use solution for local LLM (Large Language Model) and VLM (Vision Language Model) inference.

0 replies

uogbuji · 2025-06-16T18:01:33Z

uogbuji
Jun 16, 2025

Realized Toolio isn't listed.

Toolio: implements JSON schema-steered structured output (3SO) and tool-calling for MLX

0 replies

arthurcolle · 2025-07-02T03:07:06Z

arthurcolle
Jul 2, 2025

I built this https://github.com/arthurcolle/mlx.erl

Still WIP

0 replies

prateekiiest · 2025-07-04T02:31:37Z

prateekiiest
Jul 4, 2025

any benchmark nos of training on a reasonable datasets with MLX training?

0 replies

vincentamato · 2025-08-14T05:09:25Z

vincentamato
Aug 14, 2025

Text generation: mlx-coconut
An MLX port of Meta's Coconut reasoning model

Huge fan by the way!

0 replies

mrs83 · 2025-09-10T14:15:28Z

mrs83
Sep 10, 2025

BlossomTuneLLM-MLX combines mlx-lm with Flower to enable federated fine-tuning of SLMs (Small Language Models) on MacOS devices 

The project is the MLX-native evolution of an earlier codebase for FlowerTune LLM:
https://arxiv.org/abs/2506.02961
https://flower.ai/blog/2024-10-16-flowertune-llm-leaderboard
https://github.com/ethicalabs-ai/BlossomTuneLLM

How it works:

Flower handles all the federated learning logic.
A central server (superlink) coordinates the training rounds, client selection, and parameter aggregation.
Each participant in the network runs a Flower client (supernode) on their Mac. In each round, the client:
- Receives the global LoRA/DoRA adapter weights from the server.
- Loads its local data partition.
- It makes use of the mlx-lm programmatic API (mlx_lm.tuner.train) to perform LoRA/DoRA fine-tuning.
- Sends only the updated adapter weights back to the server.

0 replies

ashcairo · 2025-09-21T06:32:18Z

ashcairo
Sep 21, 2025

Inferencer is an inference app that uses mlx-lm to expose the token entropy and probabilities to allow for controlling the output generated.

0 replies

Goekdeniz-Guelmez · 2025-10-17T13:10:59Z

Goekdeniz-Guelmez
Oct 17, 2025

Would be great to have my mlx-lm-lora, and mlx-lm-lens packages in there too.

3 replies

Goekdeniz-Guelmez Oct 17, 2025

also mlx-lm-kan :D

awni Oct 17, 2025
Maintainer Author

Done!

Goekdeniz-Guelmez Oct 17, 2025

thanks!!

tuwenbo0120 · 2026-02-13T11:40:07Z

tuwenbo0120
Feb 13, 2026

Hi! I'd like to share M-Courtyard — a macOS desktop app for fine-tuning LLMs
on Apple Silicon, built on top of mlx-lm.

It provides a full GUI workflow:
📄 Import documents → 🤖 Auto-generate training data (via Ollama) → 🔧 LoRA/DoRA
fine-tune with real-time loss curves → 🧪 Chat-test → 📦 One-click export to Ollama

Key features:

No code required — everything through a visual interface
Supports Qwen 3, DeepSeek R1, GLM, Llama 3, and more from mlx-community
Real-time training visualization (loss curves, progress)
Bilingual UI (English/Chinese)
Built with Tauri 2.x + React + mlx-lm

GitHub: https://github.com/Mcourtyard/m-courtyard
License: AGPL 3.0

[附上截图 EN 1 和 EN 3]

Happy to hear any feedback!

0 replies

hwhsu1231 · 2026-02-17T04:15:47Z

hwhsu1231
Feb 17, 2026

It would be great to have the mlx-docs-l10n project maintained by the @localizethedocs organization. See the announcement post for more details. If we completes other MLX documentation localization projects in the future, maybe we could have a new category, Translation. For example:

Translation

mlx-docs-l10n: Localization of MLX Documentation

mlx-c-docs-l10n: Localization of MLX C Documentation

mlx-data-docs-l10n: Localization of MLX Data Documentation

0 replies

skryl · 2026-02-24T18:51:48Z

skryl
Feb 24, 2026

Hi all!

Would be great to add these to the list:

mlx-ruby - Ruby bindings for MLX with a nice ruby-esque DSL
mlx-onnx - ONNX/WebGPU Export for MLX models

Some examples of MLX models exported through mlx-onnx running on WebGPU in the browser here.

Cheers!

0 replies

shreyaspurohit · 2026-03-04T01:40:50Z

shreyaspurohit
Mar 4, 2026

Published a benchmark & crisis recognition-focused MLX write-up from Calm Engineering on local fine-tuning of Phi-3.5/Qwen2.5-class models (3B–7B) on M3 Max (64GB):

https://blog.calm.com/engineering/fine-tuning-slmllms-using-mlx

What’s in it:

21 training runs varying optimizer (AdamW/Adam/Adafactor) and batch size
Training + evaluation workflow details for MLX on Apple Silicon
Inference observations of up to ~37 tokens/s in this setup
Safety-oriented evaluation for crisis-recognition behavior

Thought this community might be interested in it.

0 replies

JosefAlbers · 2026-03-08T20:51:09Z

JosefAlbers
Mar 8, 2026

mlx-code: Local Claude Code-style agent via mlx-lm

It's a local Claude Code style coding agent built on mlx-lm for Apple Silicon. The goal is to keep things extremely minimal: no frameworks, no heavy deps, just a simple tool loop around a local model.

Current features:

CLI coding agent (mlx-code)
Tool loop with bash + write_file
Modular skills (~/.claude/skills/.../SKILL.md)
Persistent project context via CLAUDE.md
Runs fully local with Qwen via mlx-lm

This is very early / first-pass code, basically a minimal proof-of-concept to see how far a simple local agent can go.

0 replies

raullenchai · 2026-03-23T17:26:47Z

raullenchai
Mar 23, 2026

I've been maintaining Awesome MLX — a curated list of 80+ MLX community projects, organized by category (inference, training, audio, vision, Swift, etc.) with a Quick Start guide and model recommendations by RAM size.

Anyone can add their project via a simple issue form. PRs welcome too!

0 replies

0xbstn · 2026-03-27T01:32:51Z

0xbstn
Mar 27, 2026

MOLA: multi-LoRA inference server for MLX. One base model stays loaded, adapters route per request, no reloads.

8 adapters on Qwen3.5-9B, M5 Max 64GB: 732 tok/s same-adapter, 555 tok/s mixed at c64 (~24% overhead). Uses mx.gather_mm for batched multi-adapter decode.

Still alpha, needs a small mlx-lm patch (script included). OpenAI-compatible API.

https://github.com/0xbstn/mola

0 replies

MLX Community Projects #654

Uh oh!

Uh oh!

awni Feb 8, 2024 Maintainer

Text Generation

Vision

Speech and Audio

Multi-modal

Misc

Educational

Replies: 47 comments · 18 replies

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni Mar 1, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni Apr 2, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni May 3, 2024 Maintainer Author

Uh oh!

Uh oh!

awni May 3, 2024 Maintainer Author

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Installation

Example usage

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

awni
Feb 8, 2024
Maintainer

Replies: 47 comments 18 replies

awni Mar 1, 2024
Maintainer Author

awni Apr 2, 2024
Maintainer Author

awni May 3, 2024
Maintainer Author

awni May 3, 2024
Maintainer Author