Run Qwen3.5 entirely on your iPhone. No cloud. No API keys. No subscriptions.
demo.mp4
iPhone 17 Pro, airplane mode, zero internet. 26 tok/s.
Two years ago, GPT-4o cost $20/month and required a datacenter. Today, the same quality runs on your iPhone for free. Permanently.
"According to benchmarks Qwen3.5 4B is as good as GPT 4o. GPT 4o came out ~2 years ago (May 2024). Qwen 3.5 4B runs easily on modern mobile devices. So the gap between frontier intelligence in a datacenter and running a model of equal quality on your iPhone could be 2-3 years."
— Awni Hannun, co-creator of MLX
| Model | Size | Speed | Notes |
|---|---|---|---|
| Qwen3.5 4B | 2.9 GB | ~26 tok/s | Default. GPT-4o quality per benchmarks. |
| Qwen3.5 2B | 1.5 GB | ~55 tok/s | Fast. Good for older devices. |
| Qwen3.5 0.8B | 0.6 GB | ~80 tok/s | Fastest. Simple tasks only. |
| Qwen3.5 9B | 5.6 GB | ~20 tok/s | Best quality. Requires 8 GB RAM. |
All models are 4-bit quantized via MLX from Hugging Face.
- 100% on-device inference via Apple MLX
- Web search via Brave Search API (optional, API key stored in Keychain)
- Conversation persistence across app restarts
- Repetition detection and auto-stop
- Model switcher with download manager
- Voice input via on-device speech recognition
- Dark mode toggle (System / Light / Dark)
- Haptic feedback on send and generation complete
- Markdown rendering, copy button, tok/s stats
- iPhone 15 Pro or later (8+ GB RAM) for the default 4B model
- iPhone 15 Pro or later for the 9B model (requires 8 GB RAM)
- iOS 17+
- ~3 GB free storage for default model
- WiFi for initial model download only
Tested on iPhone 17 Pro (12 GB RAM). Older devices may work with smaller models (0.8B, 2B) but are not guaranteed. For the best experience, use an iPhone with 8+ GB RAM.
- Install Xcode 15+
- Create a free Apple Developer account at developer.apple.com (free tier works)
- In Xcode: Settings → Accounts → add your Apple ID
git clone https://github.com/carolinacherry/local-ai.git
cd local-ai
open 4B.xcodeproj- Connect your iPhone via USB
- Select your device in the Xcode toolbar
- Set your development team: Project → Signing & Capabilities → Team
- Press Run (Cmd+R)
- First run: iPhone Settings → General → VPN & Device Management → trust your developer certificate
- The app downloads the 4B model on first launch
Settings → enter your Brave Search API key (free tier: 2,000 searches/month). The app auto-detects queries needing fresh data and prefetches results before generation.
| Component | Implementation |
|---|---|
| Inference | MLX Swift (ml-explore/mlx-swift-lm) |
| Model download | HuggingFace Swift Transformers |
| Web search | Brave Search API with app-driven prefetch |
| Voice input | iOS Speech framework (on-device) |
| API key storage | iOS Keychain |
| Persistence | JSON in Documents directory |
| UI | SwiftUI |
| Backend | None. Zero server components. |
| Model | TTFT | tok/s | RAM |
|---|---|---|---|
| Qwen3.5 4B 4-bit | ~0.2s | ~26 | 2.9 GB |
| Qwen3.5 2B 4-bit | ~0.15s | ~55 | 1.5 GB |
| Qwen3.5 0.8B 4-bit | ~0.1s | ~80 | 0.6 GB |
| Qwen3.5 9B 4-bit | ~0.3s | ~20 | 5.6 GB |
Speeds vary by prompt length and device thermal state.
- MLX Swift by Apple
- Incept5/mlxchat by @jtdavies for the
enable_thinkingapproach - Qwen3.5 by Alibaba
Built by @carolinacherry