Skip to content

Sidhant185/JARVIS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

J.A.R.V.I.S.

๐Ÿง  J.A.R.V.I.S. โ€” Just A Rather Very Intelligent System

A next-generation, voice-activated AI assistant powered by cutting-edge Large Language Models, real-time internet intelligence, and autonomous system automation โ€” wrapped in a stunning futuristic holographic GUI.


๐ŸŒŸ What Is J.A.R.V.I.S.?

J.A.R.V.I.S. isn't just another chatbot โ€” it's a full-spectrum AI operating system designed to function as your personal intelligent companion. Inspired by Tony Stark's legendary AI from the Marvel universe, this project brings science fiction to life.

J.A.R.V.I.S. combines the raw reasoning power of Meta's LLaMA 3.3 70B (via Groq's lightning-fast inference engine) with real-time internet awareness, autonomous desktop automation, AI image generation, natural voice interaction, and a cinematic holographic interface โ€” all running locally on your machine.

Whether you're asking complex questions, commanding your computer hands-free, generating artwork, researching live topics, or having a deep philosophical conversation โ€” J.A.R.V.I.S. handles it all effortlessly with sub-second response latencies thanks to Groq's industry-leading inference speed.


๐Ÿ—๏ธ System Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    J.A.R.V.I.S. CORE ENGINE                    โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  ๐ŸŽ™๏ธ Voice   โ”‚  ๐Ÿง  Brain   โ”‚  ๐ŸŒ Search    โ”‚  ๐ŸŽจ Creative            โ”‚
โ”‚  Pipeline  โ”‚  Pipeline  โ”‚  Pipeline    โ”‚  Pipeline              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ STT Engine โ”‚ LLaMA 3.3  โ”‚ Google SERP  โ”‚ Stable Diffusion XL   โ”‚
โ”‚ (Google +  โ”‚ 70B via    โ”‚ Scraping +   โ”‚ 1.0 via HuggingFace   โ”‚
โ”‚  Selenium  โ”‚ Groq API   โ”‚ AI Synthesis โ”‚ Inference API          โ”‚
โ”‚  WebRTC)   โ”‚            โ”‚              โ”‚                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ Edge TTS   โ”‚ Mixtral    โ”‚ Real-time    โ”‚ Async Batch Gen        โ”‚
โ”‚ Neural     โ”‚ 8x7B for   โ”‚ Context      โ”‚ (4 imgs parallel)      โ”‚
โ”‚ Synthesis  โ”‚ Content    โ”‚ Injection    โ”‚                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚              ๐Ÿ–ฅ๏ธ FUTURISTIC HOLOGRAPHIC PyQt5 GUI                โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”‚
โ”‚  โ”‚ Animated Wave BG โ”‚ HUD Core โ”‚ Neon Rings โ”‚ Particle FX  โ”‚   โ”‚
โ”‚  โ”‚ Glow Cursor Trail โ”‚ Hex Grid โ”‚ Bokeh FX  โ”‚ Parallax     โ”‚   โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚           โš™๏ธ AUTOMATION & SYSTEM CONTROL ENGINE                 โ”‚
โ”‚  App Launch/Kill โ”‚ YouTube โ”‚ Google โ”‚ Volume โ”‚ Content Writer   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Key Features & Capabilities

๐Ÿง  Advanced AI Brain โ€” Multi-Model Intelligence

Capability Model Details
Conversational AI LLaMA 3.3 70B Versatile Deep reasoning, contextual awareness, 8192-token context window
Decision Making LLaMA 3.3 70B Versatile Multi-intent query classification with zero-shot routing
Content Generation Mixtral 8x7B 32K Long-form writing, code generation, email drafting
Fallback Intelligence LLaMA 3.1 8B Instant Ultra-fast fallback for high-load scenarios

J.A.R.V.I.S. uses a sophisticated First-Layer Decision Making Model (DMM) that classifies every user query into one or more actionable intents before routing them to the appropriate pipeline. This isn't simple keyword matching โ€” it's an AI model understanding natural language intent at a deep semantic level.

Multi-intent handling means you can say: "Open Chrome, search for quantum computing on Google, play some lo-fi music, and tell me a joke" โ€” and J.A.R.V.I.S. will parse this into 4 separate commands and execute them all simultaneously using async concurrency.

๐ŸŽ™๏ธ Natural Voice Interface โ€” Speak, Don't Type

  • Google Speech Recognition with ambient noise calibration for crystal-clear voice capture
  • Multi-language input support โ€” speak in Hindi, Spanish, French, or any language; J.A.R.V.I.S. translates to English in real-time using neural translation
  • Edge TTS Neural Synthesis โ€” responses are spoken back in a natural, human-like voice (configurable voice personas)
  • Intelligent response truncation โ€” for long answers, J.A.R.V.I.S. speaks a concise summary and displays the full text on screen
  • Selenium-powered WebRTC STT โ€” browser-grade speech recognition running headlessly for maximum accuracy
  • Continuous listening mode โ€” always-on voice detection with automatic silence handling

๐ŸŒ Real-Time Internet Intelligence

Unlike traditional chatbots trapped in their training data, J.A.R.V.I.S. has live internet access:

  • Google SERP scraping โ€” fetches top 5 real-time search results with titles and descriptions
  • AI-powered synthesis โ€” doesn't just dump search results; the AI reads, understands, and synthesizes a coherent, professional answer from live data
  • Automatic routing โ€” the DMM intelligently determines when a query requires real-time data vs. when the AI's built-in knowledge is sufficient
  • Current events awareness โ€” ask about today's news, stock prices, sports scores, weather, or any live topic
  • Real-time date/time injection โ€” always knows the current day, date, and time without external API calls

๐ŸŽจ AI Image Generation โ€” Stable Diffusion XL

J.A.R.V.I.S. can create stunning, photorealistic images from natural language descriptions:

  • Stable Diffusion XL 1.0 via HuggingFace Inference API
  • Batch generation โ€” generates 4 unique variations simultaneously using async parallelism
  • 1024ร—1024 resolution โ€” high-fidelity outputs with 50-step inference
  • Random seed diversity โ€” each image uses a unique random seed for maximum creative variation
  • Auto-display โ€” generated images are automatically opened and displayed
  • Natural language prompts โ€” just say "Generate an image of a cyberpunk city at sunset"

โš™๏ธ Desktop Automation Engine โ€” Your Computer, Hands-Free

J.A.R.V.I.S. can take full control of your desktop environment:

Command What It Does
"Open Chrome" Launches any application or website
"Close Notepad" Force-terminates any running application
"Play Shape of You" Searches and plays any song/video on YouTube
"Google search quantum computing" Opens Google with your search query
"YouTube search Python tutorials" Opens YouTube with your search query
"Volume up / down / mute" Controls system audio levels
"Write an email about project update" AI generates content and opens it in your text editor

Cross-platform support โ€” works on Windows, macOS, and Linux with platform-specific command routing.

Async parallel execution โ€” multiple automation commands run concurrently using asyncio.gather(), meaning "Open Chrome, play music, and mute the volume" all execute simultaneously, not sequentially.

๐Ÿ–ฅ๏ธ Futuristic Holographic GUI โ€” A Visual Masterpiece

The frontend is a jaw-dropping, Iron Manโ€“inspired holographic interface built with PyQt5:

  • ๐ŸŒŠ Animated Wave Background โ€” multi-layered sine waves with purple/blue gradients that react to mouse movement with parallax effects
  • โœจ Floating Particle System โ€” glowing particles drift across the screen with physics-based motion
  • ๐Ÿ”ฎ HUD Core Display โ€” rotating concentric rings with neon glow effects, reminiscent of the Arc Reactor
  • ๐Ÿ’ซ Glow Cursor Trail โ€” your mouse leaves a luminous trail with decaying alpha transparency
  • ๐Ÿ”ท Hex Grid Overlay โ€” subtle hexagonal grid pattern adds depth and sci-fi atmosphere
  • ๐ŸŒŸ Bokeh Light Effects โ€” soft, out-of-focus light orbs create cinematic depth-of-field
  • ๐Ÿ”„ Rotating Neon Rings โ€” animated orbital rings with gradient coloring
  • ๐Ÿ’ฌ Chat Interface โ€” real-time message display with smooth scrolling and gradient chat bubbles
  • ๐ŸŽค Mic Animation โ€” visual feedback showing listening/processing states
  • ๐ŸŒˆ Mode Switching โ€” dynamic background modes (Idle, Listening, Speaking, Thinking) with distinct visual themes

๐Ÿ“ Project Structure

JARVIS/
โ”œโ”€โ”€ Main.py                          # ๐Ÿš€ Core engine โ€” orchestrates all systems
โ”œโ”€โ”€ Chatbot.py                       # ๐Ÿ’ฌ Standalone chatbot interface
โ”œโ”€โ”€ Requirements.txt                 # ๐Ÿ“ฆ All dependencies
โ”œโ”€โ”€ .env                             # ๐Ÿ”‘ API keys & configuration (create your own)
โ”‚
โ”œโ”€โ”€ Backend/                         # ๐Ÿง  Intelligence & Processing Layer
โ”‚   โ”œโ”€โ”€ __init__.py                  # Package initialization
โ”‚   โ”œโ”€โ”€ Model.py                     # ๐Ÿงฉ First-Layer Decision Making Model (DMM)
โ”‚   โ”œโ”€โ”€ Chatbot.py                   # ๐Ÿค– Conversational AI engine (Groq + LLaMA)
โ”‚   โ”œโ”€โ”€ RealtimeSearchEngine.py      # ๐ŸŒ Live internet search + AI synthesis
โ”‚   โ”œโ”€โ”€ Automation.py                # โš™๏ธ Desktop automation (apps, browser, system)
โ”‚   โ”œโ”€โ”€ ImageGeneration.py           # ๐ŸŽจ AI image generation (Stable Diffusion XL)
โ”‚   โ”œโ”€โ”€ TextToSpeech.py              # ๐Ÿ”Š Neural TTS (Edge TTS + pygame)
โ”‚   โ””โ”€โ”€ SpeechToText.py              # ๐ŸŽ™๏ธ Voice recognition (Selenium + WebRTC)
โ”‚
โ”œโ”€โ”€ Frontend/                        # ๐Ÿ–ฅ๏ธ Holographic GUI Layer
โ”‚   โ”œโ”€โ”€ GUI.py                       # ๐ŸŽจ 853-line PyQt5 futuristic interface
โ”‚   โ”œโ”€โ”€ Graphics/                    # ๐Ÿ–ผ๏ธ UI assets (icons, animations, graphics)
โ”‚   โ”‚   โ”œโ”€โ”€ Jarvis.gif               # Animated J.A.R.V.I.S. logo
โ”‚   โ”‚   โ”œโ”€โ”€ Mic_on.png / Mic_off.png # Microphone state indicators
โ”‚   โ”‚   โ”œโ”€โ”€ Home.png                 # Navigation icons
โ”‚   โ”‚   โ”œโ”€โ”€ Chats.png                # Chat panel icon
โ”‚   โ”‚   โ””โ”€โ”€ ...                      # Additional UI assets
โ”‚   โ””โ”€โ”€ Files/                       # ๐Ÿ“‚ Runtime state files
โ”‚
โ””โ”€โ”€ Data/                            # ๐Ÿ’พ Runtime Data Store
    โ”œโ”€โ”€ ChatLog.json                 # ๐Ÿ“ Persistent conversation history
    โ”œโ”€โ”€ Images/                      # ๐Ÿ–ผ๏ธ AI-generated images storage
    โ””โ”€โ”€ speech.mp3                   # ๐Ÿ”Š Current TTS audio file

โšก Quick Start

Prerequisites

  • Python 3.10+ (Python 3.11 or 3.12 recommended)
  • Chrome browser installed (for Selenium-based speech recognition)
  • Microphone access for voice commands
  • Internet connection for AI inference and real-time search

1. Clone the Repository

git clone https://github.com/Sidhant185/JARVIS.git
cd JARVIS

2. Create & Activate Virtual Environment

# macOS / Linux
python3 -m venv .venv
source .venv/bin/activate

# Windows
python -m venv .venv
.venv\Scripts\activate

3. Install Dependencies

pip install -r Requirements.txt

4. Configure Environment Variables

Create a .env file in the root directory:

Username = YourName
AssistantName = Jarvis
GrogAPIKey = your_groq_api_key_here
InputLanguage = en
AssistantVoice = en-CA-LiamNeural
HuggingFaceAPIKey = your_huggingface_api_key_here

๐Ÿ”‘ Getting API Keys:

  • Groq API Key: Sign up free at console.groq.com โ€” get blazing-fast LLM inference
  • HuggingFace API Key: Sign up free at huggingface.co โ€” for Stable Diffusion XL image generation

5. Launch J.A.R.V.I.S.

python Main.py

J.A.R.V.I.S. will initialize all subsystems and begin listening for your voice commands. Speak naturally, and the AI will respond with intelligence and precision.


๐ŸŽค Voice Commands โ€” What You Can Say

๐Ÿ’ฌ General Conversation

"How are you doing today?"
"Explain quantum computing in simple terms"
"Write me a poem about the ocean"
"What's the meaning of life?"
"Help me debug this Python error"
"Tell me a joke"

๐ŸŒ Real-Time Queries

"Who is the current President of the United States?"
"What happened in the news today?"
"Tell me about the latest iPhone release"
"What's the weather like in New York?"
"What are the top trending topics right now?"

๐ŸŽจ Image Generation

"Generate an image of a futuristic city at night"
"Create an image of a dragon flying over mountains"
"Generate an image of a photorealistic portrait of a robot"

โš™๏ธ Automation

"Open Chrome"
"Open Chrome and Spotify"
"Close Notepad"
"Play Shape of You on YouTube"
"Search for machine learning tutorials on Google"
"Volume up"
"Mute the system"
"Write an application letter for a software engineer position"

๐Ÿง  Multi-Intent Commands

"Open Chrome, search for AI news on Google, and play some music"
"Tell me a joke and open Spotify"
"What time is it and remind me about the meeting"

๐Ÿ”ง Configuration & Customization

Voice Personas

Change the AI's voice by modifying AssistantVoice in .env:

Voice Code Description
en-CA-LiamNeural Canadian English (Male, Deep)
en-US-AriaNeural American English (Female, Warm)
en-US-GuyNeural American English (Male, Professional)
en-GB-SoniaNeural British English (Female, Elegant)
en-IN-NeerjaNeural Indian English (Female, Clear)
en-AU-NatashaNeural Australian English (Female, Friendly)

Input Language Support

Set InputLanguage in .env to accept voice input in different languages:

Code Language
en English
hi Hindi
es Spanish
fr French
de German
ja Japanese
ko Korean
zh Chinese

J.A.R.V.I.S. will automatically translate non-English input to English before processing.


๐Ÿ›ก๏ธ Technology Stack

Layer Technology Purpose
LLM Inference Groq Cloud Ultra-fast inference (~10x faster than OpenAI)
Primary Model LLaMA 3.3 70B Versatile Reasoning, classification, conversation
Content Model Mixtral 8x7B 32K Long-form content generation
Fallback Model LLaMA 3.1 8B Instant High-speed fallback
Image Generation Stable Diffusion XL 1.0 Text-to-image synthesis
Voice Recognition Google Speech Recognition Real-time voice-to-text
Text-to-Speech Microsoft Edge TTS Neural voice synthesis
Audio Playback pygame Cross-platform audio
Web Automation Selenium + ChromeDriver Headless browser control
Search Engine googlesearch-python Live Google SERP scraping
Translation mtranslate Multi-language translation
GUI Framework PyQt5 Desktop application interface
Image Processing Pillow (PIL) Image handling and display
HTTP Client aiohttp + requests Async and sync networking
Environment python-dotenv Secure configuration management

๐Ÿงช Module Testing

Each backend module can be tested independently:

# Test the Decision Making Model
python -m Backend.Model

# Test the Chatbot
python -m Backend.Chatbot

# Test Real-Time Search
python -m Backend.RealtimeSearchEngine

# Test Automation
python -m Backend.Automation

# Test Image Generation
python -m Backend.ImageGeneration

# Test Text-to-Speech
python -m Backend.TextToSpeech

# Test Speech-to-Text
python -m Backend.SpeechToText

๐Ÿค Contributing

Contributions are welcome! Here's how to get started:

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Commit your changes: git commit -m 'Add amazing feature'
  4. Push to the branch: git push origin feature/amazing-feature
  5. Open a Pull Request

Ideas for Contributions

  • ๐Ÿ”Œ Plugin system for extending capabilities
  • ๐Ÿ“ฑ Mobile companion app (React Native)
  • ๐Ÿ  Smart home integration (HomeKit / Home Assistant)
  • ๐Ÿ“Š Dashboard for analytics and usage stats
  • ๐Ÿ” Wake word detection ("Hey Jarvis")
  • ๐ŸŽต Spotify API integration for music control
  • ๐Ÿ“ง Email reading and drafting via Gmail API
  • ๐Ÿ—“๏ธ Calendar integration with Google Calendar

๐Ÿ“œ License

This project is licensed under the MIT License โ€” see the LICENSE file for details.


๐Ÿ‘ค Author

Sidhant Pande


โญ If you found this project impressive, please give it a star! โญ

"Sometimes you gotta run before you can walk." โ€” Tony Stark

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages