A native Mac app that lets you chat with Google's Gemma 4 AI model — completely offline. Your conversations never leave your machine. No account needed, no API keys, no monthly fees.
- 100% offline — works without internet, on a plane, anywhere
- Private — zero telemetry, no data sent anywhere, ever
- Thinking mode — toggle with Cmd+T for step-by-step reasoning
- Conversation history — auto-saved, searchable across all chats
- Inline controls — temperature, tokens, thinking mode right above the input
- Multimodal — text, images and audio (Gemma 4 E4B)
- Benchmark slideshow — see how it compares to paid models
- Signed & notarized — Apple Developer ID, opens without security warnings
- Mac with Apple Silicon (M1, M2, M3, M4)
- 16 GB RAM
- macOS 13 Ventura or later
- ~7 GB free disk space
- Download the DMG
- Open it, drag to Applications
- Open the app, click "Iniciar modelo"
- Wait ~10 seconds while the model loads into memory
- Chat
The first launch downloads the model (~5 GB) from HuggingFace. After that, everything runs locally.
Works without internet. On a desert island, on a plane, at your grandparents' village. If your Mac turns on, your AI works.
Gemma 4 E4B (free, local) vs paid API models:
| Benchmark | Gemma 4 E4B | GPT-4o mini | Claude 3.5 Haiku |
|---|---|---|---|
| MMLU Pro | 69.4 | 63.1 | 65.0 |
| LiveCodeBench | 52.0 | 23.4 | 31.4 |
| GPQA Diamond | 58.6 | 44.2 | 41.6 |
It beats every paid lightweight model. See the full comparison →
The app bundles a Python virtual environment with vMLX (an MLX-based inference engine optimized for Apple Silicon). When you click "Iniciar modelo", it starts a local server and loads the Gemma 4 E4B model into GPU memory. The Electron frontend communicates with it via a local API. Nothing touches the network.
| Shortcut | Action |
|---|---|
Cmd+T |
Toggle thinking mode |
Cmd+N |
New conversation |
Enter |
Send message |
Shift+Enter |
New line |
Cmd+Q |
Quit (stops model, frees RAM) |
The vMLX engine was fully audited before integration:
- No telemetry — zero outbound connections
- No eval/exec — only
mx.eval()(MLX GPU sync) - No pickle — all weights loaded via safetensors
- API key auth — optional, for network exposure
- Agentic tools disabled — shell execution features are off by default
# Remove the app
rm -rf "/Applications/Gemma 4 Local.app"
# Remove the model (~5 GB)
rm -rf ~/.cache/huggingface/hub/models--mlx-community--gemma-4-e4b-it-4bit
# Remove the project (if installed from source)
rm -rf ~/.gemma4-localNothing else is modified. No daemons, no PATH changes, no config files.
This is an independent project. Not affiliated with Google, Alphabet, or DeepMind. "Gemma" is a trademark of Google LLC. Model weights are licensed under Apache 2.0 by Google. See NOTICE for full trademark attributions.
MIT — the app code is yours to use, modify, and distribute.



