HAL 9000

"I am completely operational, and all my circuits are functioning perfectly."

Local, multimodal AI agent — sees, hears, thinks, speaks, and acts on your machine.
Cross-platform (macOS, Windows, Linux) · Free mode (zero API keys) · Claude Code co-work hub.

Product Page · Free Mode · Quick Start · Co-Work · Changelog

A local, multimodal AI agent that sees you via webcam, hears your voice, thinks via LLM, speaks with a cloned voice, acts on your machine, and integrates with Claude Code via MCP. Runs entirely on your machine with a browser-based control panel.

Works on macOS, Windows, and Linux. One codebase, auto-detects OS at runtime.

What HAL Can Do

Capability	How
See	Webcam feed with browser HUD — scanlines, corner brackets, REC indicator
Hear	Browser mic recording (Web Audio API, live waveform, silence detection) + Whisper STT (API or local faster-whisper)
Think	Multi-provider LLM (GPT-4o, Claude, Gemini, Ollama) with function calling
Speak	3 voice providers — Edge TTS (free/fast), ElevenLabs (paid/best), XTTS (local/cloned)
Act	43 cross-platform tools — shell, apps, files, web search, memory, clipboard, app automation, Claude Code delegation, background tasks, artifacts, multi-agent orchestration
Chat	Terminal-style chat with streaming responses, 35 slash commands (categorized menu, keyboard nav), mic button — type or speak to HAL
Disambiguate	Smart choice sheet UI — HAL presents numbered options, user clicks to select
Integrate	MCP server exposes 21 tools to Claude Code/Desktop for bidirectional AI collaboration
Remember	Typed persistent memory — facts, decisions, preferences, session summaries
Know	Knowledge upload (drag-drop or button) — PDFs, docs, code, images. BM25 keyword search, deep-read or skim modes. Also loads local files + remote llms.txt URLs at boot
Co-Work	Background task runner, artifact workspace, multi-agent orchestration, cross-agent context handoff

Architecture

┌──────────────────────────────────────────────────────────┐
│                     HAL9000 ENGINE                        │
│                                                          │
│  Vision ──┐                                              │
│            ├──→ Brain (LLM + function calling)           │
│  Browser ──┘       │              │                       │
│  Mic + Chat        ▼              ▼                       │
│                 Voice          Tools (43)                 │
│           (Edge/11Labs/XTTS) (OS agent layer)            │
│                   │                                      │
│                   ▼                                      │
│          Browser Audio + Waveform                        │
│                                                          │
│  Knowledge ─── Memory (typed) ─── Session Tracking       │
│  TaskRunner ── Orchestrator ── Artifact Store             │
└──────────────────────────────────────────────────────────┘
         │                              │
    Flask Server                  MCP Server
    localhost:9000              (Claude Code integration)

Free Mode

Run HAL with zero API keys and zero cost — fully local operation.

# 1. Install Ollama (local LLM)
brew install ollama        # macOS
# or: curl -fsSL https://ollama.com/install.sh | sh   # Linux
# or: download from ollama.com                         # Windows

# 2. Pull a model
ollama pull llama3.1

# 3. Set up HAL
git clone https://github.com/shandar/HAL9000.git
cd HAL9000 && python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt

# 4. One line in .env
echo "FREE_MODE=true" > .env

# 5. Run
python server.py

Layer	Free Provider	Paid Alternative
Brain	Ollama (Llama 3.1, Mistral, Phi-3)	GPT-4o, Claude, Gemini
STT	faster-whisper (local Whisper)	OpenAI Whisper API
TTS	Edge TTS (default, always free)	ElevenLabs, XTTS

FREE_MODE=true overrides AI_PROVIDER, STT_PROVIDER, and TTS_PROVIDER in one toggle. You can also mix — e.g., FREE_MODE=true with STT_PROVIDER=whisper_api for local brain + cloud STT.

Cross-Platform Support

HAL auto-detects your OS and uses the right system commands:

Feature	macOS	Windows	Linux
Volume	AppleScript	nircmd / PowerShell	pactl / amixer
Brightness	ioreg	WMI	brightnessctl
Notifications	osascript	Toast API	notify-send
Clipboard	pbcopy/pbpaste	Get/Set-Clipboard	xclip / wl-clipboard
Screenshot	screencapture	PIL.ImageGrab	scrot / grim
Battery	pmset	psutil / WMI	psutil / sysfs
WiFi	networksetup	netsh wlan	nmcli
Apps	open -a + .app scan	start + Start Menu scan	gtk-launch + .desktop scan
Terminal	AppleScript Terminal	Windows Terminal / cmd	gnome-terminal / konsole
Embedded Terminal	✅ xterm.js + PTY	❌ External only	✅ xterm.js + PTY

No #ifdef, no separate builds — one pip install, one python server.py.

System Requirements

Minimum (Free Mode — Ollama + faster-whisper)

Component	Requirement
OS	macOS 12+, Windows 10+, or Ubuntu 20.04+ (any modern Linux)
CPU	4 cores (Intel i5 / Apple M1 / AMD Ryzen 5 or better)
RAM	8 GB (Ollama loads models into memory — llama3.1 8B needs ~5 GB)
Disk	6 GB free (3 GB for Ollama model + 1 GB for faster-whisper model + HAL)
Python	3.10 or higher
Browser	Any modern browser (Chrome, Firefox, Safari, Edge)
Microphone	Required for voice input (built-in or USB)
Webcam	Optional — required only for vision features
Network	Not required (fully offline operation)

Recommended (Paid Providers — GPT-4o, Claude, ElevenLabs)

Component	Requirement
RAM	4 GB (no local models loaded)
Disk	500 MB free
Network	Required (API calls to OpenAI/Anthropic/Google)
API Keys	At least `OPENAI_API_KEY` for GPT-4o + Whisper STT

Performance Notes

Mode	Brain Latency	STT Latency	TTS Latency	RAM Usage
Free (Ollama llama3.1)	~2-5s (CPU), ~1-2s (Apple Silicon)	~1-3s (faster-whisper base)	~0.7s (Edge TTS)	~5-6 GB
Free (Ollama phi3)	~1-2s (CPU)	~1-3s	~0.7s	~3 GB
Paid (GPT-4o)	~1-2s (API)	~0.5s (Whisper API)	~0.7s (Edge)	~200 MB
Paid (Claude)	~1-3s (API)	~0.5s	~1.2s (ElevenLabs)	~200 MB

Apple Silicon users: Ollama runs significantly faster on M1/M2/M3/M4 chips with Metal acceleration. An M1 MacBook Air can run llama3.1 8B comfortably.

GPU users (Linux/Windows): Ollama supports NVIDIA CUDA. With a 6 GB+ VRAM GPU, expect 2-3x faster inference than CPU.

Quick Start

git clone https://github.com/shandar/HAL9000.git
cd HAL9000
python -m venv .venv
source .venv/bin/activate         # Windows: .venv\Scripts\activate
pip install -r requirements.txt
cp .env.example .env              # fill in your API keys (or set FREE_MODE=true)
python server.py                  # start the web control panel

Open http://localhost:9000 → click Activate.

Free mode? Just echo "FREE_MODE=true" > .env — no API keys needed. See Free Mode.

Claude Code Integration

# Register HAL as an MCP server for Claude Code
claude mcp add hal-9000 -- python /path/to/HAL9000/hal_mcp_server.py

Now Claude Code can see through your webcam, speak aloud, control your Mac, access HAL's memory, and hand off session context.

Co-Work Features

HAL operates as a co-work hub — coordinating work across HAL, Claude Code CLI, and Claude Desktop.

Typed Memory & Context Handoff

Memories are typed: fact, decision, preference, task, session_summary
Sessions auto-summarize on shutdown — HAL remembers what happened
Claude Code can call hal_get_context to load relevant context at session start
Manual "wrap up" via hal_save_session or the save_session tool

Background Task Runner

Submit long-running coding tasks via background_task tool
Tasks run asynchronously via claude --print with real-time progress
Configurable concurrency (default 2), 600s timeout, cancellation
Task queue panel in the UI shows status with live updates

Shared Workspace (Artifacts)

HAL creates visual artifacts (code, markdown, HTML, Mermaid diagrams) via create_artifact
Artifacts appear in a tabbed workspace panel alongside the chat
3-column layout when artifacts are active
Copy button, close button, sandboxed HTML rendering

Multi-Agent Orchestration

Spawn multiple named Claude Code agents on parallel tasks via orchestrate
Conflict detection when agents modify the same files
Agent dashboard with status indicators, file lists, cancel controls
Results summarized and stored in session memory

API Keys

With FREE_MODE=true, no API keys are needed at all. See Free Mode.

Key	Where	Required
`OPENAI_API_KEY`	platform.openai.com	Only if using GPT-4o brain or Whisper API STT
`ANTHROPIC_API_KEY`	console.anthropic.com	Only if `AI_PROVIDER=anthropic`
`GEMINI_API_KEY`	aistudio.google.com	Only if `AI_PROVIDER=gemini`
`ELEVENLABS_API_KEY`	elevenlabs.io	Only if `TTS_PROVIDER=elevenlabs`
`ELEVENLABS_VOICE_ID`	ElevenLabs voice library	Only if `TTS_PROVIDER=elevenlabs`

No API key needed for: Edge TTS (default voice), Ollama (local LLM), faster-whisper (local STT).

Voice Providers

HAL supports 3 TTS providers, switchable at runtime from the dashboard:

Provider	Cost	Speed	Quality	Config
Edge TTS (default)	Free	~0.7s	Good — deep male voice	`TTS_PROVIDER=edge`
ElevenLabs	Paid	~1.2s	Best — natural prosody	`TTS_PROVIDER=elevenlabs`
XTTS (local)	Free	~4.5s	Good — voice cloning	`TTS_PROVIDER=local` (requires Python 3.11)

Switch from the dashboard UI or set TTS_PROVIDER in .env.

Web Dashboard

Access at http://localhost:9000 after starting the server.

Panel	Description
HAL panel	HAL 9000 eye with real-time waveform visualization overlaid on red block during speech
HAL image controls	3D-style Vision/Voice/Claude buttons positioned on the HAL image strip
Webcam HUD	Live MJPEG feed with scanlines, corner brackets, REC indicator — collapses when vision is off
Voice selector	Segmented switch to swap between Edge/ElevenLabs/XTTS at runtime
Chat window	Terminal-style chat with prompt prefixes, streaming responses, formatted lists, mic button — Enter to send, Shift+Enter for newline
Slash commands	35 categorized commands with keyboard navigation — type `/` to open menu
Choice sheet	Slide-up modal for disambiguation — auto-detects when HAL presents numbered options
Task queue	Collapsible panel showing background tasks and agents with live status
Workspace	Tabbed artifact panel — code, diagrams, HTML — stacks above chat when artifacts are created
Embedded terminal	Full interactive xterm.js terminal (PTY-backed) — run shell, Claude Code, review artifacts in-app (macOS/Linux)
Resizable layout	Drag handles between columns to resize HAL, workspace, and chat panels
Power button	Circular SVG power icon in top toolbar — activates/deactivates HAL
Status bar	Connection status, timestamp, version
Boot greeting	Time-aware creative HAL-style greeting with 20 randomized boot lines

The UI uses a sci-fi industrial aesthetic — brushed metal bezels, LED indicator lights, recessed panels.

PWA support: Add to Home Screen on mobile for a native app experience.

Project Structure

HAL9000/
├── server.py              # Flask web server + API endpoints (localhost only)
├── hal9000.py             # HAL engine — lifecycle, main loop, browser audio
├── hal_mcp_server.py      # MCP server for Claude Code/Desktop integration (21 tools)
├── config.py              # Settings + env loading with safe parsing
├── requirements.txt
├── .env.example
├── .mcp.json              # Project MCP config for Claude Code
│
├── core/
│   ├── brain.py           # Multi-provider LLM + function calling (thread-safe)
│   ├── vision.py          # Webcam capture + MJPEG stream
│   ├── hearing.py         # Mic recording + VAD + Whisper STT
│   ├── voice.py           # Multi-provider TTS (Edge/ElevenLabs/XTTS)
│   ├── memory_store.py    # Typed memory store with auto-migration
│   ├── task_runner.py     # Async background task queue for Claude Code
│   ├── orchestrator.py    # Multi-agent coordinator with conflict detection
│   ├── terminal_server.py # Embedded WebSocket terminal (xterm.js PTY bridge, port 9001)
│   ├── platform/           # Cross-platform OS abstraction (auto-detected)
│   │   ├── __init__.py     # Auto-detect: Darwin → mac, Windows → windows, Linux → linux
│   │   ├── base.py         # Abstract PlatformAPI interface (15 methods)
│   │   ├── mac.py          # macOS: AppleScript, osascript, pbcopy, screencapture
│   │   ├── windows.py      # Windows: PowerShell, WMI, Toast, PIL.ImageGrab
│   │   └── linux.py        # Linux: pactl, xclip, notify-send, brightnessctl
│   │
│   ├── tools/              # Tool registry + 43 tools across 8 domain modules
│   │   ├── __init__.py     # Registry, execute(), format converters, security
│   │   ├── shell.py        # run_shell (whitelisted commands)
│   │   ├── apps.py         # open/quit/list apps, open URLs, app_action (cross-platform)
│   │   ├── files.py        # list/read/write/search/info
│   │   ├── system.py       # volume, brightness, notifications, clipboard, screenshot (cross-platform)
│   │   ├── web.py          # web_search, fetch_url
│   │   ├── memory.py       # remember, recall, forget, list_memories, save_session
│   │   ├── delegation.py   # claude_code, background_task, orchestrate, agents (cross-platform)
│   │   └── artifacts.py    # create_artifact, update_artifact
│   └── knowledge.py       # Knowledge loader (files + URLs) + upload ingestion + BM25 search
│
├── knowledge/             # Drop files here or upload via UI — HAL indexes at boot + runtime
│   ├── sources.txt        # Remote URLs to fetch (llms.txt, etc.)
│   └── *.txt              # Local knowledge files
│
├── memory/                # Persistent typed memory (created at runtime)
│   └── facts.json         # Typed entries: {id, type, content, timestamp, source, session_id, metadata}
│
├── assets/
│   ├── HAL.png            # Dashboard hero image
│   ├── HAL-eye.png        # App icon / PWA icon source
│   ├── manifest.json      # PWA manifest
│   ├── sw.js              # Service worker for PWA
│   └── voice/             # XTTS reference clips (optional)
│
└── templates/
    └── index.html         # Web dashboard + chat UI + waveform + task panel + workspace

Security

HAL has been security-hardened:

Measure	Detail
Command whitelist	`run_shell` only allows 77 approved commands (ls, git, npm, etc.)
Blocked commands	`sudo`, `shutdown`, `diskutil`, etc. are explicitly blocked
AppleScript escaping	All user strings escaped before osascript interpolation
App action blocklist	`app_action` blocks `do shell script`, `system events`, etc.
Localhost binding	Flask binds to `127.0.0.1` by default (override with `HAL_HOST`)
Code exec guard	`/api/run` only accepts requests from localhost (127.0.0.1 / ::1)
WebSocket origin check	Terminal WebSocket validates origin header — rejects cross-site connections
Input length limits	Chat input capped at 2000 chars
Secret file blocking	`read_file` refuses to read `.env`, `credentials.json`, etc.
Safe config parsing	Malformed env vars fall back to defaults instead of crashing
No `shell=True`	All subprocess calls use argument lists, never shell interpretation

Configuration

All settings in .env. See .env.example for the full list.

Setting	Default	Description
`FREE_MODE`	`false`	One toggle for zero-cost: Ollama + faster-whisper + Edge TTS
`AI_PROVIDER`	`openai`	Brain provider: openai, anthropic, gemini, ollama
`STT_PROVIDER`	`whisper_api`	Speech-to-text: whisper_api (cloud) or faster_whisper (local)
`TTS_PROVIDER`	`edge`	Voice provider: edge, elevenlabs, local
`OLLAMA_MODEL`	`llama3.1`	Ollama model name (when using Ollama brain)
`OLLAMA_BASE_URL`	`http://localhost:11434`	Ollama server URL
`WHISPER_MODEL_SIZE`	`base`	faster-whisper model: tiny, base, small, medium
`EDGE_VOICE`	`en-US-GuyNeural`	Edge TTS voice ID
`FRAME_INTERVAL`	`2.0`	Seconds between webcam samples
`MIC_RECORD_SECONDS`	`5`	Max recording duration per utterance
`SILENCE_THRESHOLD`	`500`	Audio amplitude below this = silence
`SERVER_PORT`	`9000`	Web server port
`TOOL_MAX_ITERATIONS`	`5`	Max tool calls per conversation turn
`TASK_TIMEOUT`	`600`	Background task timeout (seconds)
`MAX_CONCURRENT_TASKS`	`2`	Max parallel background tasks
`MAX_AGENTS`	`4`	Max orchestrated agents
`HAL_TERMINAL_PORT`	`9001`	WebSocket port for embedded terminal
`HAL_HOST`	`127.0.0.1`	Server bind address (use `0.0.0.0` for LAN access)

Commit Convention

feat(core): add wake word detection
fix(vision): handle no-camera fallback
chore(deps): update anthropic sdk

Docs

Changelog — Version history and changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HAL 9000

What HAL Can Do

Architecture

Free Mode

Cross-Platform Support

System Requirements

Minimum (Free Mode — Ollama + faster-whisper)

Recommended (Paid Providers — GPT-4o, Claude, ElevenLabs)

Performance Notes

Quick Start

Claude Code Integration

Co-Work Features

Typed Memory & Context Handoff

Background Task Runner

Shared Workspace (Artifacts)

Multi-Agent Orchestration

API Keys

Voice Providers

Web Dashboard

Project Structure

Security

Configuration

Commit Convention

Docs

About

Uh oh!

Releases

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
assets		assets
core		core
templates		templates
.env.example		.env.example
.gitignore		.gitignore
.mcp.json		.mcp.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
config.py		config.py
hal9000.py		hal9000.py
hal_mcp_server.py		hal_mcp_server.py
requirements.txt		requirements.txt
server.py		server.py
setup_mcp.sh		setup_mcp.sh

Folders and files

Latest commit

History

Repository files navigation

HAL 9000

What HAL Can Do

Architecture

Free Mode

Cross-Platform Support

System Requirements

Minimum (Free Mode — Ollama + faster-whisper)

Recommended (Paid Providers — GPT-4o, Claude, ElevenLabs)

Performance Notes

Quick Start

Claude Code Integration

Co-Work Features

Typed Memory & Context Handoff

Background Task Runner

Shared Workspace (Artifacts)

Multi-Agent Orchestration

API Keys

Voice Providers

Web Dashboard

Project Structure

Security

Configuration

Commit Convention

Docs

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Contributors

Uh oh!

Languages