halo-ai

The bare-metal AI stack for AMD Strix Halo

89 tok/s. Zero containers. 115GB GPU memory. Compiled from source.

Quick Install

curl -fsSL https://raw.githubusercontent.com/bong-water-water-bong/halo-ai/main/install.sh | bash

What is this?

A complete AI platform for the AMD Ryzen AI MAX+ 395 — LLM inference, chat, deep research, voice, image generation, RAG, and workflows. All bare metal, all compiled from source, all on one chip with 128GB unified memory.

Services

Service	Port	Purpose
Lemonade	8080	Unified AI API (OpenAI/Ollama/Anthropic compatible)
llama.cpp	8081	LLM inference — Vulkan + HIP dual backends
Open WebUI	3000	Chat with RAG, documents, multi-model
Vane	3001	Deep research (Perplexica)
SearXNG	8888	Private search
Qdrant	6333	Vector DB for RAG
n8n	5678	Workflow automation
whisper.cpp	8082	Speech-to-text
Kokoro	8083	Text-to-speech
ComfyUI	8188	Image generation
Dashboard	3003	GPU metrics + service health (DreamServer fork)

All services bind to 127.0.0.1. Access via SSH tunnel or Caddy reverse proxy.

Performance

Model	Speed	Size
Qwen3-30B-A3B (MoE)	89 tok/s	18 GB
Llama 3 70B	~18 tok/s	40 GB
Llama 3 70B Q8	~10 tok/s	70 GB

Full benchmarks: BENCHMARKS.md

Architecture

    SSH / Caddy (only network entry point)
              │
    ┌─────────▼──────────┐
    │   Lemonade :8080   │  ← OpenAI-compatible API
    └──┬──────┬──────┬───┘
       ▼      ▼      ▼
  llama.cpp whisper  Kokoro    ← GPU inference (gfx1151, 115GB GTT)
       │
  Open WebUI :3000             ← Chat UI
  Vane :3001                   ← Deep research
  n8n :5678                    ← Workflows
  ComfyUI :8188                ← Image gen
  Qdrant :6333  SearXNG :8888  ← RAG + search

Tools

halo-models.sh list              # Browse models for Strix Halo
halo-models.sh download <name>   # Download a model
halo-models.sh activate <name>   # Switch models (auto-benchmark)
halo-driver-swap.sh vulkan       # Switch to Vulkan backend
halo-driver-swap.sh hip          # Switch to HIP/ROCm backend
halo-update.sh update            # Update everything (snapshot-protected)
halo-proxy-setup.sh              # Enable Caddy reverse proxy
halo-backup.sh                   # Backup service data

Autonomous Agent

The halo-ai agent runs continuously:

Monitors all services every 30s, auto-restarts failures
Btrfs snapshot before every repair
GPU health, thermals, memory monitoring
Auto-applies system security patches
Tracks inference performance trends
Silent unless it cannot fix something

VS Code / GitHub Copilot

Local Copilot via Lemonade VS Code extension. Your code AI runs on your Strix Halo — private, offline, zero API costs.

Docs

SECURITY.md — Access model, SSH config, Caddy setup, firewall
BENCHMARKS.md — Full performance data
Credits

Credits & Acknowledgements

DreamServer — The Project That Started It All

Before anything else, massive respect and gratitude to DreamServer and the Light-Heart-Labs team.

DreamServer is the reason halo-ai exists. Full stop.

They were the first to look at the Strix Halo chip and see not just an inference engine, but a complete AI platform. While everyone else was benchmarking single models, Light-Heart-Labs built an entire ecosystem — LLM inference, chat UI, voice I/O, RAG pipelines, autonomous agents, image generation, workflow automation, privacy tools, and a monitoring dashboard — all integrated, all working together, all on one chip.

That vision is what halo-ai is built on. The architecture, the service selection, the idea that a single Strix Halo box can replace a rack of cloud services — that came from DreamServer. Their open-source dashboard panel is what powers the halo-ai control center. We did not just take inspiration — we directly forked their work and built on it.

Fork: github.com/bong-water-water-bong/DreamServer

Lemonade SDK

Lemonade by AMD is the backbone of this stack. Their unified API layer provides OpenAI, Ollama, and Anthropic compatibility through a single endpoint. Their dedicated engineering on gfx1151 ROCm support, NPU+GPU hybrid inference, and the llamacpp-rocm nightly builds turned scattered hardware support into a production-ready platform. Without Lemonade, Strix Halo AI would still be a series of disconnected experiments.

Upstream Projects

Every service in this stack is open source. We compile from source, but we build nothing — these teams do:

Project	What It Does	License
llama.cpp	The LLM inference engine that powers the entire stack	MIT
Open WebUI	Full-featured chat interface with RAG and multi-model support	MIT
Vane (Perplexica)	AI-powered deep research with cited sources	MIT
whisper.cpp	Fast, GPU-accelerated speech-to-text	MIT
Kokoro-FastAPI	High-quality text-to-speech API	Apache 2.0
ComfyUI	Node-based image generation pipeline	GPL 3.0
SearXNG	Privacy-respecting meta-search engine	AGPL 3.0
Qdrant	High-performance vector database for RAG	Apache 2.0
n8n	Workflow automation with 400+ integrations	Sustainable Use
Caddy	Reverse proxy with automatic TLS	Apache 2.0
ROCm / TheRock	AMD GPU compute stack for gfx1151	Various

Community

kyuz0 — Docker toolboxes and comprehensive backend benchmarks
Gygeek — Framework laptop setup guides
The Framework laptop community for extensive hardware testing
The Arch Linux community for bleeding-edge packages
Everyone contributing to ROCm gfx1151 support in kernel, Mesa, and userspace

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
agent		agent
comfyui		comfyui
configs		configs
dashboard-api		dashboard-api
dashboard-ui		dashboard-ui
dreamserver-fork		dreamserver-fork
kokoro		kokoro
lemonade		lemonade
llama-cpp		llama-cpp
n8n		n8n
open-webui		open-webui
qdrant		qdrant
scripts		scripts
searxng		searxng
systemd		systemd
vane		vane
whisper-cpp		whisper-cpp
.gitignore		.gitignore
BENCHMARKS.md		BENCHMARKS.md
README.md		README.md
SECURITY.md		SECURITY.md
install.sh		install.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

halo-ai

The bare-metal AI stack for AMD Strix Halo

Quick Install

What is this?

Services

Performance

Architecture

Tools

Autonomous Agent

VS Code / GitHub Copilot

Docs

Credits & Acknowledgements

DreamServer — The Project That Started It All

Lemonade SDK

Upstream Projects

Community

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

halo-ai

The bare-metal AI stack for AMD Strix Halo

Quick Install

What is this?

Services

Performance

Architecture

Tools

Autonomous Agent

VS Code / GitHub Copilot

Docs

Credits & Acknowledgements

DreamServer — The Project That Started It All

Lemonade SDK

Upstream Projects

Community

License

About

Resources

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages