Skip to content

parthwhy/ai-avatar-assistant

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

4 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿง  SAGE - Smart AI Desktop Assistant

Your Intelligent Voice-Controlled Desktop Companion

Python Platform License AI

SAGE (Smart AI General-purpose Engine) is an intelligent desktop assistant that combines voice commands, AI orchestration, and automation to help you control your computer hands-free.

Features โ€ข Demo โ€ข Installation โ€ข Usage โ€ข API Keys


๐ŸŽฌ Demo

๐Ÿ“น Demo video coming soon! Record a video showing SAGE in action and add it here.


๐Ÿ“ธ Screenshots

SAGE Interface

SAGE Particle UI with voice control and text input


โœจ Features

๐ŸŽค Voice Control

  • Wake Word Detection - Say "Hey SAGE" to activate
  • Natural Language Processing - Speak naturally
  • Text-to-Speech Responses - SAGE talks back
  • Continuous Listening - Always ready to help

๐Ÿค– AI-Powered Orchestration

  • Smart Task Planning - Breaks complex tasks into steps
  • Automatic Tool Selection - Picks the right tool for the job
  • Multi-Step Workflows - Chains actions together
  • Real-time Progress Display - See what SAGE is thinking

๐Ÿ’ฌ Dual Input Mode

  • Voice Commands - Hands-free operation
  • Text Input - Type commands directly in the UI
  • Hybrid Mode - Switch seamlessly between both

๐Ÿ“ง Communication

  • Email - Send emails via Gmail (browser-based)
  • WhatsApp - Messages, voice calls, video calls
  • Smart Contacts - Contact database with lookup
  • Templates - Pre-built email templates

๐Ÿ”ง System Control

  • App Control - Open/close any application
  • Volume & Brightness - Adjust system settings
  • Power Management - Lock, sleep, shutdown
  • Text Typing - Type anywhere on screen

๐Ÿ“… Productivity

  • Meeting Scheduler - Google Meet + Calendar
  • Calculator - Math calculations
  • Web Search - Quick searches
  • File Search - Find files in Downloads

๐ŸŽฏ More Features

Feature Description
๐ŸŽต Spotify Control Play songs, skip tracks, control playback
๐Ÿ“ Content Generation Create documents, emails, invitations with AI
๐Ÿ‘๏ธ Screen Analysis AI vision to understand what's on screen
๐Ÿ”„ Task Recording Record and replay mouse/keyboard actions
โšก Auto Tool Generation Creates new automation tools on-demand

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                         SAGE Architecture                        โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚   โ”‚  Voice   โ”‚โ”€โ”€โ”€โ–ถโ”‚ Orchestrator โ”‚โ”€โ”€โ”€โ–ถโ”‚       Tools          โ”‚  โ”‚
โ”‚   โ”‚  Input   โ”‚    โ”‚  (Groq AI)   โ”‚    โ”‚                      โ”‚  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚  โ€ข System Control    โ”‚  โ”‚
โ”‚        โ”‚                โ”‚             โ”‚  โ€ข Communication     โ”‚  โ”‚
โ”‚        โ–ผ                โ–ผ             โ”‚  โ€ข Productivity      โ”‚  โ”‚
โ”‚   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”‚  โ€ข Media Control     โ”‚  โ”‚
โ”‚   โ”‚   Text   โ”‚    โ”‚    Code      โ”‚    โ”‚  โ€ข AI Tools          โ”‚  โ”‚
โ”‚   โ”‚  Input   โ”‚    โ”‚  Generator   โ”‚    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚ (OpenRouter) โ”‚              โ”‚               โ”‚
โ”‚                   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜              โ–ผ               โ”‚
โ”‚                                         โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”        โ”‚
โ”‚                                         โ”‚   Response   โ”‚        โ”‚
โ”‚                                         โ”‚  (TTS + UI)  โ”‚        โ”‚
โ”‚                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜        โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿš€ Installation

Prerequisites

  • Python 3.10+
  • Windows 10/11
  • Microphone (for voice commands)

Quick Start

# 1. Clone the repository
git clone https://github.com/yourusername/sage.git
cd sage

# 2. Create virtual environment
python -m venv venv
venv\Scripts\activate

# 3. Install dependencies
pip install -r requirements.txt

# 4. Configure API keys (see below)
cp .env.example .env
# Edit .env with your API keys

# 5. Run SAGE
python main.py

๐Ÿ”‘ API Keys Setup

SAGE requires API keys to function. All keys have free tiers available.

Required Keys

Service Purpose Get Key Free Tier
Groq Main AI (Llama 3.3 70B) console.groq.com โœ… Yes
Picovoice Wake word detection console.picovoice.ai โœ… Yes

Optional Keys (Enhanced Features)

Service Purpose Get Key Free Tier
OpenRouter Code generation, Screen analysis openrouter.ai โœ… Limited
Gemini Fallback AI provider aistudio.google.com โœ… Yes

Configuration

  1. Copy the example file:

    cp .env.example .env
  2. Edit .env and add your keys:

    GROQ_API_KEY=your_groq_key_here
    PICOVOICE_ACCESS_KEY=your_picovoice_key_here
    OPENROUTER_API_KEY=your_openrouter_key_here  # Optional

๐ŸŽฎ Usage

Starting SAGE

python main.py

Voice Commands

Simply say "Hey SAGE" followed by your command:

Category Example Commands
Apps "Open Chrome", "Close Notepad", "Open Spotify"
System "Set volume to 50", "Lock the screen", "What time is it"
Email "Send email to manager about sick leave"
WhatsApp "Send WhatsApp to John saying hello"
Meetings "Schedule meeting with Sarah tomorrow at 3 PM"
Music "Play Shape of You on Spotify", "Next song"
Math "What is 25 times 4", "Calculate 100 divided by 7"
Search "Search downloads for PDF files"
Content "Write a birthday invitation for Saturday"

Text Input

You can also type commands directly in the input box at the bottom of the UI.


๐Ÿค– AI Models

Component Model Provider Purpose
Orchestrator Llama 3.3 70B Groq Task planning & execution
Code Generator Qwen 2.5 Coder 32B OpenRouter Auto-generate tools
Screen Analyzer Qwen 2.5 VL 72B OpenRouter Vision analysis
Content Generator Llama 3.3 70B Groq Documents & emails

๐Ÿ“ Project Structure

sage/
โ”œโ”€โ”€ main.py                 # Entry point
โ”œโ”€โ”€ config/                 # Configuration
โ”‚   โ”œโ”€โ”€ settings.py         # Settings management
โ”‚   โ””โ”€โ”€ api_keys.py         # API key handling
โ”œโ”€โ”€ core/                   # Core AI logic
โ”‚   โ”œโ”€โ”€ orchestrator.py     # Main AI orchestrator
โ”‚   โ”œโ”€โ”€ task_executor.py    # Task execution
โ”‚   โ”œโ”€โ”€ code_generator.py   # Auto tool generation
โ”‚   โ””โ”€โ”€ intent_parser.py    # Intent classification
โ”œโ”€โ”€ tools/                  # All automation tools
โ”‚   โ”œโ”€โ”€ system/             # System control
โ”‚   โ”œโ”€โ”€ productivity/       # Productivity tools
โ”‚   โ”œโ”€โ”€ communication/      # Email, WhatsApp
โ”‚   โ”œโ”€โ”€ media/              # Spotify control
โ”‚   โ””โ”€โ”€ ai/                 # AI-powered tools
โ”œโ”€โ”€ voice/                  # Voice modules
โ”‚   โ”œโ”€โ”€ wake_word.py        # Wake word detection
โ”‚   โ”œโ”€โ”€ speech_to_text.py   # Speech recognition
โ”‚   โ””โ”€โ”€ tts.py              # Text-to-speech
โ”œโ”€โ”€ ui/                     # User interface
โ”‚   โ””โ”€โ”€ particle_window.py  # Main GUI
โ”œโ”€โ”€ data/                   # Data files
โ”‚   โ””โ”€โ”€ contacts.json       # Contact database
โ”œโ”€โ”€ tests/                  # Test files
โ””โ”€โ”€ examples/               # Demo scripts

๐Ÿ› ๏ธ Development

Running Tests

# Run all tests
python -m pytest tests/

# Run specific test
python tests/test_all_functionalities.py

Adding New Tools

  1. Create a new file in tools/<category>/
  2. Define your function with proper docstring
  3. Register it in core/orchestrator.py

See CONTRIBUTING.md for detailed guidelines.


๐Ÿ“‹ Requirements

groq>=0.4.0
requests>=2.31.0
python-dotenv>=1.0.0
pvporcupine>=3.0.0
speechrecognition>=3.10.0
pyttsx3>=2.90
pyaudio>=0.2.13
pyautogui>=0.9.54
pyperclip>=1.8.2
pynput>=1.7.6
Pillow>=10.0.0

๐Ÿค Contributing

Contributions are welcome! Please read CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments


**Made by Parth **

โญ Star this repo if you find it useful!

About

๐ŸŽ™๏ธ AI-powered voice assistant for Windows with 40+ tools, auto-tool generation, and natural language control

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages