[research] omlx — SSD KV caching doubles agent swarm capacity on Apple Silicon

## oMLX — LLM inference server with SSD KV caching for Apple Silicon

**What it does:** oMLX is a native macOS LLM inference server built on Apple MLX with a two-tier KV cache: hot blocks stay in RAM, cold blocks spill to SSD in `safetensors` format. Unlike Ollama (which evicts KV state from RAM and recomputes), oMLX persists KV cache across restarts and context switches. It exposes OpenAI-compatible Chat Completions and Anthropic-compatible Messages endpoints, includes continuous batching, and ships a menu-bar UI + web admin dashboard.

**Why it matters for ShellForge:** ShellForge's swarm mode is fundamentally RAM-limited — the README itself documents the ceiling (e.g. 3-4 agents on M4 Pro 48 GB with qwen3:30b Q4). oMLX's SSD spill layer could double or triple that capacity without requiring more RAM: once a batch agent's KV blocks go cold between tool calls, they move to SSD and free up space for new agents. Because oMLX is OpenAI API-compatible, Crush (which already targets Ollama's OpenAI-compat endpoint) can point at it with a single endpoint config change — no code changes to the governance layer.

**GitHub:** https://github.com/jundot/omlx ⭐ 7,168 (created Feb 2026)

**License:** Apache 2.0 ✅

**Rough integration effort:** Moderate — swap `OLLAMA_HOST` for the oMLX endpoint in `shellforge setup`, add an oMLX status check to `shellforge status`, and document the SSD cache config (`--kv-cache-ssd-path`) in the README swarm section.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[research] omlx — SSD KV caching doubles agent swarm capacity on Apple Silicon #54

oMLX — LLM inference server with SSD KV caching for Apple Silicon

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[research] omlx — SSD KV caching doubles agent swarm capacity on Apple Silicon #54

Description

oMLX — LLM inference server with SSD KV caching for Apple Silicon

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions