"Sometimes you gotta run before you can walk." β Tony Stark
βββ ββββββ βββββββ βββ ββββββββββββββ
ββββββββββββββββββββββ ββββββββββββββ
ββββββββββββββββββββββ ββββββββββββββ
ββ βββββββββββββββββββββββ βββββββββββββββ
βββββββββββ ββββββ βββ βββββββ βββββββββββ
ββββββ βββ ββββββ βββ βββββ βββββββββββ
Just A Rather Very Intelligent System
v1.0 β Stark Industries, R&D Division
OpenJarvis is a modular, open-source AI assistant backend built for people who think a chatbot is beneath them. It listens to your voice, sees your screen, knows your context, reacts to real-world events, and runs autonomous operators β all while you sip your scotch and work on the suit.
βββββββββββββββββββββββββββββββββββ
β YOU (Stark) β
ββββββββββββββββ¬βββββββββββββββββββ
β voice / text / screen
ββββββββββββββββΌβββββββββββββββββββ
β JARVIS CORE β
β β
βββββββ€ Intelligence Engine βββββββ
β β Agents Memory β β
β β Learning EventBus β β
β ββββββββββββββββ¬βββββββββββββββββββ β
β β β
ββββββββββΌβββββββ βββββββββββΌβββββββββββ βββββββββββΌβββββββ
β Voice Loop β β Operators / Cron β β Channels β
β Wake Word β β Event Triggers β β Telegram β
β STT -> TTS β β File/HTTP/Metric β β Discord β
βββββββββββββββββ ββββββββββββββββββββββ ββββββββββββββββββ
Five primitives β Intelligence, Engine, Agents, Memory, Learning β compose into anything from a voice-controlled desktop companion to a fleet of autonomous operators monitoring your infrastructure.
Full always-on voice pipeline. Say "Jarvis" β it wakes, listens, thinks, speaks back.
jarvis listen # wake-word mode
jarvis listen --no-wake-word # always listening
jarvis listen --once # one shot, exits cleanly
jarvis listen --screenshot --screenshot-ocr # Jarvis sees your screen tooPipeline: Mic β Energy VAD β STT (Whisper/Deepgram) β Wake Word β Agent β TTS (Kokoro/OpenAI) β Playback
Jarvis can see your displays. Capture full screen or a region, extract text with OCR, feed it to any LLM.
jarvis ask "what is this error?" --screenshot
jarvis ask "summarize the document on screen" --screenshot --screenshot-ocr
jarvis ask "describe the left monitor" --screenshot --screenshot-region 0,0,1920,1080A structured living profile that Jarvis always knows β your identity, contacts, active projects, preferences.
jarvis profile import # first-run wizard
jarvis profile show
jarvis profile set name "Tony Stark"
jarvis profile prefer "never send emails without my go-ahead"
jarvis profile contact add "Pepper" --role ceo --note "handles everything"
jarvis profile project add "Mark VIII" --status active --desc "repulsor upgrade"Autonomous agents that wake up and act when things happen in the real world β not just on a timer.
[[operator.event_triggers]]
type = "file"
path = "~/inbox"
pattern = "*.pdf"
events = ["created"]
[[operator.event_triggers]]
type = "system_metric"
metric = "cpu_percent"
threshold = 85.0
operator = ">"
[[operator.event_triggers]]
type = "http_poll"
url = "https://status.openai.com"
fire_on_change = true
[[operator.event_triggers]]
type = "bus_event"
event_type = "channel_message_received"
filter_key = "channel"
filter_value = "telegram"Four trigger types: file changes, system metrics (CPU/RAM/disk), HTTP content changes, internal event bus.
From a simple one-shot responder to a full ReAct loop with tool use:
| Agent | What it does |
|---|---|
simple |
Direct Q&A β fast, no tools |
orchestrator |
Breaks tasks into subtasks, delegates |
native_react |
ReAct loop with tool calls |
operative |
Operator-grade autonomous executor |
critic |
Self-critiques and revises output |
planner |
Long-horizon planning |
summarizer |
Distils and compresses |
multimodal |
Vision + text |
code |
Code generation and execution |
SQLite (default) + FAISS vector search + BM25 + ColBERT reranking. Context is automatically injected into every query β Jarvis remembers.
Telegram, Discord, Slack, Gmail, Twitter/X, Reddit, Twilio, and more.
jarvis channel add telegram --token YOUR_BOT_TOKEN
jarvis channel add discord --token YOUR_BOT_TOKEN# Clone the suit
git clone https://github.com/akhilyad/__OpenJarvis.git
cd __OpenJarvis
# Sync with uv (recommended)
uv sync
# Or pip
pip install -e .uv sync --extra voice --extra speech
# Optional: better VAD
uv sync --extra voice-vad
# Optional: Kokoro local TTS
pip install kokoropip install mss Pillow # capture
pip install pytesseract # OCR (also needs Tesseract binary)uv sync --extra operators-events # psutil + watchdogjarvis init # initialise ~/.openjarvis/
jarvis profile import # tell Jarvis who you are
jarvis doctor # health check
jarvis ask "hello, Jarvis" # first contact# Ask
jarvis ask "what is the weather in Kolkata?"
jarvis ask "draft a reply to this email" --screenshot --screenshot-ocr
jarvis chat # interactive session
# Voice
jarvis listen # always-on voice loop
jarvis listen --no-wake-word --once # one command, done
# Agents + Tools
jarvis ask "search and summarise AI news" --agent orchestrator --tools web_search
jarvis ask "write and run this script" --agent code --tools shell_exec
# Memory
jarvis memory search "project deadline"
jarvis memory add "Mark VIII repulsor upgrade due 2026-05-01"
# Operators
jarvis operators list
jarvis operators activate inbox-monitor
jarvis operators run-once inbox-monitor
# Profile
jarvis profile show
jarvis profile prefer "always use bullet points"
# System
jarvis doctor # health check
jarvis model list # available models
jarvis serve # start REST API serverConfig lives at ~/.openjarvis/config.toml:
[intelligence]
default_model = "gpt-4o"
preferred_engine = "openai"
[speech]
backend = "faster_whisper"
wake_word = "jarvis"
vad_engine = "energy"
tts_backend = "kokoro"
silence_timeout_ms = 1500
[memory]
default_backend = "sqlite"
[telemetry]
enabled = true| Extra | What you get |
|---|---|
voice |
sounddevice + soundfile (mic + playback) |
voice-vad |
webrtcvad (better speech detection in noise) |
voice-wakeword |
openwakeword (hot-word model, no STT in hot path) |
bundle-voice |
voice + speech + kokoro TTS |
screen |
mss + Pillow (screen capture + resize) |
screen-ocr |
+ pytesseract (text extraction from screen) |
operators-events |
psutil + watchdog (event-driven operators) |
memory-faiss |
FAISS vector search |
inference-cloud |
OpenAI + Anthropic |
inference-mlx |
Apple MLX (macOS only) |
inference-vllm |
vLLM (GPU server) |
# Full suit β everything
uv sync --extra bundle-voice --extra screen --extra operators-events --extra memory-faiss- Feature 1 β Voice Loop (
jarvis listen) - Feature 2 β Event-Driven Operators
- Feature 3 β Personal Context Layer
- Feature 4 β Screen Awareness
- Feature 5 β HUD / Heads-Up Display
- Feature 6 β Home Automation Bridge
- Feature 7 β Minions Protocol (multi-agent swarm)
- Feature 8 β Agent-to-Agent (A2A) communication
- Feature 9 β Self-Improving Prompts
- Feature 10 β Mobile Companion App
"Fortunately, I am Iron Man." β and even I have a punch list.
| # | Severity | Issue |
|---|---|---|
| 1 | CRITICAL | screen_capture not auto-registered in ToolRegistry |
| 2 | HIGH | voice/loop.py imports from CLI layer (arch violation) |
| 3 | HIGH | _can_fire() race condition in watcher threads |
| 4 | HIGH | VAD has no max utterance duration limit |
| 5 | MEDIUM | profile/store.py reads file without explicit UTF-8 encoding |
| 6 | MEDIUM | Screen resize silently skipped if Pillow absent |
Being fixed in the next sprint.
Pull requests welcome. If you break the suit, fix the suit.
- Fork
git checkout -b feature/repulsor-upgradegit commit -m 'add repulsor upgrade'git push origin feature/repulsor-upgrade- Open a PR
MIT β "I prefer to think of it as liberating."
Built with arc reactor energy.
"Jarvis, sometimes I think you're the only one who gets me."
