Gizmo-AI

A fully local AI assistant — no cloud, no limits, no data leaving your machine.

A complete self-hosted AI assistant running entirely on local hardware. 9B LLM with vision, neural voice cloning, code execution, web search, task tracking — all in six containers. Zero cloud dependencies.

Highlights

Intelligence

9B parameter LLM with thinking mode
Vision (images) and video analysis
30 structured analysis patterns
Smart context windowing with semantic recall
Character-accurate letter counting

Voice

Streaming audio — hear responses as they generate (~3s to first audio)
Neural TTS with voice cloning
Speech-to-text via Whisper
10 language support
Voice Studio for managing cloned voices
Auto-unloads from VRAM when idle

Tools

Code execution in 7 languages
Syntax-highlighted Code Playground
Document generation (PDF, DOCX, XLSX, PPTX)
Web search via self-hosted SearXNG
BM25-ranked persistent memory

Quick Start

git clone https://github.com/nisakson2000/Gizmo-AI.git
cd Gizmo-AI
bash scripts/download-model.sh   # Downloads ~14GB (LLM + TTS + vision projector)
bash scripts/build-llamacpp.sh   # Builds model server (~5-10min)
bash scripts/start.sh            # Starts all 6 services
# Open http://localhost:3100

Architecture

                    ┌─────────────── gizmo-net (10.90.0.0/24) ──────────────┐
                    │                                                        │
  ┌──────────┐     │  ┌────────────────┐          ┌───────────────┐         │
  │ gizmo-ui │────▶│  │    gizmo-      │────────▶ │ gizmo-llama   │         │
  │  :3100   │     │  │  orchestrator  │          │   :8080       │         │
  │ SvelteKit│     │  │  :9100 FastAPI │          │  Qwen3.5-9B   │         │
  │ + nginx  │     │  └───────┬────────┘          │  [GPU]        │         │
  └──────────┘     │     ┌────┼─────┐             └───────────────┘         │
                   │┌────▼──┐ │ ┌───▼─────┐  ┌─────────────┐               │
                   ││searxng│ │ │gizmo-tts│  │gizmo-whisper│               │
                   ││ :8300 │ │ │  :8400  │  │   :8200     │               │
                   ││ [CPU] │ │ │  [GPU]  │  │   [CPU]     │               │
                   │└───────┘ │ └─────────┘  └─────────────┘               │
                   └──────────┴────────────────────────────────────────────┘

Service	Port	Role	GPU
gizmo-llama	8080	LLM inference (Qwen3.5-9B Q8_0 + vision)	Yes
gizmo-orchestrator	9100	FastAPI backend — routing, streaming, tools	No
gizmo-ui	3100	SvelteKit web UI via nginx	No
gizmo-tts	8400	Qwen3-TTS neural voice cloning	Yes
gizmo-whisper	8200	faster-whisper speech-to-text	No
gizmo-searxng	8300	Self-hosted web search	No

Features

Chat & Conversation

Streaming chat with persistent server-side history and LLM-generated titles
Regenerate & edit — re-roll any response or edit a sent message, with < 1/N > variant navigation
Full-text search — sidebar filters by title; press Enter for deep message content search
Conversation export as formatted Markdown
Double-click to rename conversations
Scroll-to-bottom floating button when scrolled up
Mobile swipe gestures for sidebar (swipe right to open, left to close)

AI Capabilities

Mode switcher — 6 behavioral modes (Chat, Brainstorm, Coder, Research, Planner, Roleplay) + custom mode creation with prompt editor
Usage analytics — token counts, response times, and cloud cost comparison dashboard at /analytics
Thinking mode — step-by-step reasoning in collapsible blocks (toggle on/off)
Vision — analyze images via multimodal vision projector (mmproj)
Video analysis — upload video, extract frames, analyze visual content with playback
Audio transcription — upload M4A/MP3/WAV for Whisper transcription + LLM analysis
Multi-round tool calling — model autonomously chains up to 5 rounds of tool calls
Web search via self-hosted SearXNG — no API keys
Document upload — PDFs, text, code up to 50MB
Memory — BM25-ranked facts with recency weighting + semantic session recall (CPU embeddings)
Cross-conversation recall — two-tier semantic search across all past conversations with topic room categorization
Conversation compaction — rolling LLM summaries preserve context awareness in long conversations
Knowledge extraction — automatic temporal fact tracking with entity normalization and invalidation
Smart context windowing — keeps most relevant older messages by semantic similarity
Recitation — fetches authoritative text from the web for poems, lyrics, speeches
Character analysis — accurate letter counting via pre-computed character maps

Voice & TTS

Streaming TTS — sentence-level audio streaming via WebSocket (~3s to first audio vs 7-45s batch mode), with gapless browser playback
Voice Studio — upload reference audio, name and save voices, adjustable clip duration
Qwen3-TTS — GPU-accelerated neural voice cloning (x-vector mode) via faster-qwen3-tts
Speed control — 0.5x to 2.0x
10 languages — English, Chinese, Japanese, Korean, German, French, Russian, Portuguese, Spanish, Italian
Speech-to-text — dictate via microphone with Whisper
Auto-unload — TTS model frees VRAM after 60s idle

Code & Tools

Sandbox — 7 languages (Python, JavaScript, Bash, C, C++, Go, Lua) in isolated containers (no network, 256MB RAM, read-only fs)
Code Playground — /code route with syntax highlighting (highlight.js), resizable split pane, auto-save, copy/download, word wrap, output file display
AI code assistant — isolated chat overlay with multi-round tool calling
Document generation — PDF, DOCX, XLSX, PPTX, CSV, TXT via natural language
Markup preview — live rendering for HTML, CSS, SVG, Markdown
Memory Manager — browse, add, and delete memories from the UI

Patterns & Routing

30 patterns — Fabric-inspired cognitive templates (extract_wisdom, summarize, analyze_threat, etc.)
Intelligent routing — model sees only 3-8 relevant tools per request via keyword pre-routing
Auto or explicit — patterns activate by keyword matching or [pattern:name] prefix
Pattern-scoped tools — each pattern declares which tools are available

Task Tracker

Built-in task and note management at /tracker
Tags, priorities, due dates, recurrence (daily/weekly/biweekly/monthly/yearly), subtasks
Free-text search across titles, descriptions, and tags
Keyboard navigation — j/k navigate, x toggle status, n new task, / search
Inline title editing (double-click), collapsible subtasks, undo delete with toast
LLM chat overlay for natural language task creation

UI & Accessibility

9 Nintendo themes — NES, SNES, GBA, N64, GameCube, Wii, DS, 3DS, Switch with console frames, sound effects, screen overlays, and boot animations
Keyboard shortcuts — Ctrl+Shift+N (new chat), Ctrl+Shift+T (think), Ctrl+/ (focus), Escape (close)
Mobile support — swipe gestures, always-visible message actions on touch devices
Accessibility — focus trapping in modals, aria-expanded, sidebar keyboard nav, prefers-reduced-motion
Service health — live status dashboard for all backend services
Dual API — WebSocket for streaming UI, REST (/api/chat) for programmatic access
Tailscale HTTPS — secure access from any device on your tailnet
100% local — your data never leaves your machine

Android App

Native Compose chat — streaming responses, markdown rendering, syntax highlighting
Multi-server profiles — connect to LAN, Tailscale, or any Gizmo instance
Thinking mode — collapsible reasoning blocks, tool call status cards
Vision + documents — attach images and files for analysis
Conversation management — search, rename, delete with undo
Mode selector — Chat, Brainstorm, Coder, Research, and custom modes
Auto-reconnect — exponential backoff on network interruption
Build from source — containerized Podman build, no Android Studio needed
CI releases — GitHub Actions builds APK on version tags

Hardware Requirements

	Minimum	Tested
GPU	NVIDIA, 16GB+ VRAM	RTX 4090, 24GB
RAM	32GB	64GB DDR5
Disk	50GB free	NVMe SSD
OS	Linux (Ubuntu, Fedora, Arch)	Bazzite OS (Fedora)
Runtime	Podman or Docker + NVIDIA container support	Podman 5.8

VRAM breakdown

Component	VRAM	Notes
Qwen3.5-9B weights (Q8_0)	~9.5 GB	Always loaded
KV cache (Q8_0, 32K context)	~6.2 GB	Grows with conversation
Qwen3-TTS	~4.0 GB	Auto-unloads after 60s idle
Peak total	~20.7 GB	LLM + TTS active
Whisper	0 GB	Runs on CPU

Documentation

Full documentation is available on the Wiki.

Page	Description
How Local AI Works	First-principles explanation of LLMs and local AI
Setup Guide	Step-by-step installation
Usage Guide	Day-to-day usage
Architecture	Full technical reference
Model Reference	Qwen3.5-9B, TTS, Whisper specs and VRAM budget
Development	Extending the stack

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 149 Commits
.github		.github
config		config
mobile		mobile
scripts		scripts
services		services
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Gizmo-AI

Highlights

Intelligence

Voice

Tools

Quick Start

Architecture

Features

Hardware Requirements

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Gizmo-AI

Highlights

Intelligence

Voice

Tools

Quick Start

Architecture

Features

Hardware Requirements

Documentation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages