A next-generation, voice-activated AI assistant powered by cutting-edge Large Language Models, real-time internet intelligence, and autonomous system automation โ wrapped in a stunning futuristic holographic GUI.
J.A.R.V.I.S. isn't just another chatbot โ it's a full-spectrum AI operating system designed to function as your personal intelligent companion. Inspired by Tony Stark's legendary AI from the Marvel universe, this project brings science fiction to life.
J.A.R.V.I.S. combines the raw reasoning power of Meta's LLaMA 3.3 70B (via Groq's lightning-fast inference engine) with real-time internet awareness, autonomous desktop automation, AI image generation, natural voice interaction, and a cinematic holographic interface โ all running locally on your machine.
Whether you're asking complex questions, commanding your computer hands-free, generating artwork, researching live topics, or having a deep philosophical conversation โ J.A.R.V.I.S. handles it all effortlessly with sub-second response latencies thanks to Groq's industry-leading inference speed.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ J.A.R.V.I.S. CORE ENGINE โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐๏ธ Voice โ ๐ง Brain โ ๐ Search โ ๐จ Creative โ
โ Pipeline โ Pipeline โ Pipeline โ Pipeline โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโค
โ STT Engine โ LLaMA 3.3 โ Google SERP โ Stable Diffusion XL โ
โ (Google + โ 70B via โ Scraping + โ 1.0 via HuggingFace โ
โ Selenium โ Groq API โ AI Synthesis โ Inference API โ
โ WebRTC) โ โ โ โ
โโโโโโโโโโโโโโผโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Edge TTS โ Mixtral โ Real-time โ Async Batch Gen โ
โ Neural โ 8x7B for โ Context โ (4 imgs parallel) โ
โ Synthesis โ Content โ Injection โ โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโโโโโโค
โ ๐ฅ๏ธ FUTURISTIC HOLOGRAPHIC PyQt5 GUI โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Animated Wave BG โ HUD Core โ Neon Rings โ Particle FX โ โ
โ โ Glow Cursor Trail โ Hex Grid โ Bokeh FX โ Parallax โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ๏ธ AUTOMATION & SYSTEM CONTROL ENGINE โ
โ App Launch/Kill โ YouTube โ Google โ Volume โ Content Writer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
| Capability | Model | Details |
|---|---|---|
| Conversational AI | LLaMA 3.3 70B Versatile | Deep reasoning, contextual awareness, 8192-token context window |
| Decision Making | LLaMA 3.3 70B Versatile | Multi-intent query classification with zero-shot routing |
| Content Generation | Mixtral 8x7B 32K | Long-form writing, code generation, email drafting |
| Fallback Intelligence | LLaMA 3.1 8B Instant | Ultra-fast fallback for high-load scenarios |
J.A.R.V.I.S. uses a sophisticated First-Layer Decision Making Model (DMM) that classifies every user query into one or more actionable intents before routing them to the appropriate pipeline. This isn't simple keyword matching โ it's an AI model understanding natural language intent at a deep semantic level.
Multi-intent handling means you can say: "Open Chrome, search for quantum computing on Google, play some lo-fi music, and tell me a joke" โ and J.A.R.V.I.S. will parse this into 4 separate commands and execute them all simultaneously using async concurrency.
- Google Speech Recognition with ambient noise calibration for crystal-clear voice capture
- Multi-language input support โ speak in Hindi, Spanish, French, or any language; J.A.R.V.I.S. translates to English in real-time using neural translation
- Edge TTS Neural Synthesis โ responses are spoken back in a natural, human-like voice (configurable voice personas)
- Intelligent response truncation โ for long answers, J.A.R.V.I.S. speaks a concise summary and displays the full text on screen
- Selenium-powered WebRTC STT โ browser-grade speech recognition running headlessly for maximum accuracy
- Continuous listening mode โ always-on voice detection with automatic silence handling
Unlike traditional chatbots trapped in their training data, J.A.R.V.I.S. has live internet access:
- Google SERP scraping โ fetches top 5 real-time search results with titles and descriptions
- AI-powered synthesis โ doesn't just dump search results; the AI reads, understands, and synthesizes a coherent, professional answer from live data
- Automatic routing โ the DMM intelligently determines when a query requires real-time data vs. when the AI's built-in knowledge is sufficient
- Current events awareness โ ask about today's news, stock prices, sports scores, weather, or any live topic
- Real-time date/time injection โ always knows the current day, date, and time without external API calls
J.A.R.V.I.S. can create stunning, photorealistic images from natural language descriptions:
- Stable Diffusion XL 1.0 via HuggingFace Inference API
- Batch generation โ generates 4 unique variations simultaneously using async parallelism
- 1024ร1024 resolution โ high-fidelity outputs with 50-step inference
- Random seed diversity โ each image uses a unique random seed for maximum creative variation
- Auto-display โ generated images are automatically opened and displayed
- Natural language prompts โ just say "Generate an image of a cyberpunk city at sunset"
J.A.R.V.I.S. can take full control of your desktop environment:
| Command | What It Does |
|---|---|
"Open Chrome" |
Launches any application or website |
"Close Notepad" |
Force-terminates any running application |
"Play Shape of You" |
Searches and plays any song/video on YouTube |
"Google search quantum computing" |
Opens Google with your search query |
"YouTube search Python tutorials" |
Opens YouTube with your search query |
"Volume up / down / mute" |
Controls system audio levels |
"Write an email about project update" |
AI generates content and opens it in your text editor |
Cross-platform support โ works on Windows, macOS, and Linux with platform-specific command routing.
Async parallel execution โ multiple automation commands run concurrently using asyncio.gather(), meaning "Open Chrome, play music, and mute the volume" all execute simultaneously, not sequentially.
The frontend is a jaw-dropping, Iron Manโinspired holographic interface built with PyQt5:
- ๐ Animated Wave Background โ multi-layered sine waves with purple/blue gradients that react to mouse movement with parallax effects
- โจ Floating Particle System โ glowing particles drift across the screen with physics-based motion
- ๐ฎ HUD Core Display โ rotating concentric rings with neon glow effects, reminiscent of the Arc Reactor
- ๐ซ Glow Cursor Trail โ your mouse leaves a luminous trail with decaying alpha transparency
- ๐ท Hex Grid Overlay โ subtle hexagonal grid pattern adds depth and sci-fi atmosphere
- ๐ Bokeh Light Effects โ soft, out-of-focus light orbs create cinematic depth-of-field
- ๐ Rotating Neon Rings โ animated orbital rings with gradient coloring
- ๐ฌ Chat Interface โ real-time message display with smooth scrolling and gradient chat bubbles
- ๐ค Mic Animation โ visual feedback showing listening/processing states
- ๐ Mode Switching โ dynamic background modes (Idle, Listening, Speaking, Thinking) with distinct visual themes
JARVIS/
โโโ Main.py # ๐ Core engine โ orchestrates all systems
โโโ Chatbot.py # ๐ฌ Standalone chatbot interface
โโโ Requirements.txt # ๐ฆ All dependencies
โโโ .env # ๐ API keys & configuration (create your own)
โ
โโโ Backend/ # ๐ง Intelligence & Processing Layer
โ โโโ __init__.py # Package initialization
โ โโโ Model.py # ๐งฉ First-Layer Decision Making Model (DMM)
โ โโโ Chatbot.py # ๐ค Conversational AI engine (Groq + LLaMA)
โ โโโ RealtimeSearchEngine.py # ๐ Live internet search + AI synthesis
โ โโโ Automation.py # โ๏ธ Desktop automation (apps, browser, system)
โ โโโ ImageGeneration.py # ๐จ AI image generation (Stable Diffusion XL)
โ โโโ TextToSpeech.py # ๐ Neural TTS (Edge TTS + pygame)
โ โโโ SpeechToText.py # ๐๏ธ Voice recognition (Selenium + WebRTC)
โ
โโโ Frontend/ # ๐ฅ๏ธ Holographic GUI Layer
โ โโโ GUI.py # ๐จ 853-line PyQt5 futuristic interface
โ โโโ Graphics/ # ๐ผ๏ธ UI assets (icons, animations, graphics)
โ โ โโโ Jarvis.gif # Animated J.A.R.V.I.S. logo
โ โ โโโ Mic_on.png / Mic_off.png # Microphone state indicators
โ โ โโโ Home.png # Navigation icons
โ โ โโโ Chats.png # Chat panel icon
โ โ โโโ ... # Additional UI assets
โ โโโ Files/ # ๐ Runtime state files
โ
โโโ Data/ # ๐พ Runtime Data Store
โโโ ChatLog.json # ๐ Persistent conversation history
โโโ Images/ # ๐ผ๏ธ AI-generated images storage
โโโ speech.mp3 # ๐ Current TTS audio file
- Python 3.10+ (Python 3.11 or 3.12 recommended)
- Chrome browser installed (for Selenium-based speech recognition)
- Microphone access for voice commands
- Internet connection for AI inference and real-time search
git clone https://github.com/Sidhant185/JARVIS.git
cd JARVIS# macOS / Linux
python3 -m venv .venv
source .venv/bin/activate
# Windows
python -m venv .venv
.venv\Scripts\activatepip install -r Requirements.txtCreate a .env file in the root directory:
Username = YourName
AssistantName = Jarvis
GrogAPIKey = your_groq_api_key_here
InputLanguage = en
AssistantVoice = en-CA-LiamNeural
HuggingFaceAPIKey = your_huggingface_api_key_here๐ Getting API Keys:
- Groq API Key: Sign up free at console.groq.com โ get blazing-fast LLM inference
- HuggingFace API Key: Sign up free at huggingface.co โ for Stable Diffusion XL image generation
python Main.pyJ.A.R.V.I.S. will initialize all subsystems and begin listening for your voice commands. Speak naturally, and the AI will respond with intelligence and precision.
"How are you doing today?"
"Explain quantum computing in simple terms"
"Write me a poem about the ocean"
"What's the meaning of life?"
"Help me debug this Python error"
"Tell me a joke"
"Who is the current President of the United States?"
"What happened in the news today?"
"Tell me about the latest iPhone release"
"What's the weather like in New York?"
"What are the top trending topics right now?"
"Generate an image of a futuristic city at night"
"Create an image of a dragon flying over mountains"
"Generate an image of a photorealistic portrait of a robot"
"Open Chrome"
"Open Chrome and Spotify"
"Close Notepad"
"Play Shape of You on YouTube"
"Search for machine learning tutorials on Google"
"Volume up"
"Mute the system"
"Write an application letter for a software engineer position"
"Open Chrome, search for AI news on Google, and play some music"
"Tell me a joke and open Spotify"
"What time is it and remind me about the meeting"
Change the AI's voice by modifying AssistantVoice in .env:
| Voice Code | Description |
|---|---|
en-CA-LiamNeural |
Canadian English (Male, Deep) |
en-US-AriaNeural |
American English (Female, Warm) |
en-US-GuyNeural |
American English (Male, Professional) |
en-GB-SoniaNeural |
British English (Female, Elegant) |
en-IN-NeerjaNeural |
Indian English (Female, Clear) |
en-AU-NatashaNeural |
Australian English (Female, Friendly) |
Set InputLanguage in .env to accept voice input in different languages:
| Code | Language |
|---|---|
en |
English |
hi |
Hindi |
es |
Spanish |
fr |
French |
de |
German |
ja |
Japanese |
ko |
Korean |
zh |
Chinese |
J.A.R.V.I.S. will automatically translate non-English input to English before processing.
| Layer | Technology | Purpose |
|---|---|---|
| LLM Inference | Groq Cloud | Ultra-fast inference (~10x faster than OpenAI) |
| Primary Model | LLaMA 3.3 70B Versatile | Reasoning, classification, conversation |
| Content Model | Mixtral 8x7B 32K | Long-form content generation |
| Fallback Model | LLaMA 3.1 8B Instant | High-speed fallback |
| Image Generation | Stable Diffusion XL 1.0 | Text-to-image synthesis |
| Voice Recognition | Google Speech Recognition | Real-time voice-to-text |
| Text-to-Speech | Microsoft Edge TTS | Neural voice synthesis |
| Audio Playback | pygame | Cross-platform audio |
| Web Automation | Selenium + ChromeDriver | Headless browser control |
| Search Engine | googlesearch-python | Live Google SERP scraping |
| Translation | mtranslate | Multi-language translation |
| GUI Framework | PyQt5 | Desktop application interface |
| Image Processing | Pillow (PIL) | Image handling and display |
| HTTP Client | aiohttp + requests | Async and sync networking |
| Environment | python-dotenv | Secure configuration management |
Each backend module can be tested independently:
# Test the Decision Making Model
python -m Backend.Model
# Test the Chatbot
python -m Backend.Chatbot
# Test Real-Time Search
python -m Backend.RealtimeSearchEngine
# Test Automation
python -m Backend.Automation
# Test Image Generation
python -m Backend.ImageGeneration
# Test Text-to-Speech
python -m Backend.TextToSpeech
# Test Speech-to-Text
python -m Backend.SpeechToTextContributions are welcome! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
- ๐ Plugin system for extending capabilities
- ๐ฑ Mobile companion app (React Native)
- ๐ Smart home integration (HomeKit / Home Assistant)
- ๐ Dashboard for analytics and usage stats
- ๐ Wake word detection ("Hey Jarvis")
- ๐ต Spotify API integration for music control
- ๐ง Email reading and drafting via Gmail API
- ๐๏ธ Calendar integration with Google Calendar
This project is licensed under the MIT License โ see the LICENSE file for details.
Sidhant Pande
- GitHub: @Sidhant185
โญ If you found this project impressive, please give it a star! โญ
"Sometimes you gotta run before you can walk." โ Tony Stark
