Skip to content

TSK-3/BME

Repository files navigation

VGS Companion - Visual Guidance System

Full AI-Powered Assistance Through Phase 7

"Turning Vision Into Independence"


What's Built

VGS Companion is a complete visual assistance system with 7 phases of functionality:

Phase Feature Status
Phase 1 Camera + YOLO + Basic TTS
Phase 2 Position Awareness (Left/Center/Right)
Phase 3 Distance Estimation
Phase 4 Indoor Navigation (Doors/Stairs)
Phase 5 Custom YOLO Training Setup
Phase 6 Multi-language TTS (EN/HI/TE)
Phase 7 Adaptive Learning

Quick Start

# Install dependencies
pip install -r requirements.txt

# Install Ollama for AI features (optional)
winget install Ollama.Ollama
ollama pull llava

# Run the assistant
python src/main.py

Voice Commands

Command Action
"Hey" / "Hey Assistant" Activate assistant
"What do you see?" Describe surroundings with position
"Where is the [object]?" Locate specific object
"How far is it?" Get distance to nearest object
"Help" List capabilities
"Language Hindi/English/Telugu" Change language
"Stop" / "Bye" Return to standby

Project Structure

vgs/
├── src/
│   ├── main.py              # Main companion loop
│   ├── camera.py            # Camera capture
│   ├── detector.py          # Basic YOLO detection
│   ├── enhanced_detector.py # Phase 2-7 detection
│   ├── speaker.py           # Multi-language TTS
│   ├── wake_word.py         # Wake word detection
│   ├── listener.py          # Speech recognition
│   ├── companion.py         # LLaVA integration
│   └── train_model.py       # Phase 5 training setup
├── models/
│   ├── yolov8n.pt          # YOLOv8 model
│   └── vosk/               # Offline speech model
├── data/
│   └── user_feedback.json  # Phase 7 learning data
├── requirements.txt
└── README.md

Phase Details

Phase 1: MVP

  • Camera capture with OpenCV
  • YOLOv8 object detection
  • Text-to-speech output
  • Fully offline operation

Phase 2: Position Awareness

Objects are described by their position:

  • Left: x < 35% of frame
  • Center: 35% - 65% of frame
  • Right: x > 65% of frame

Example: "I see a person on your left, and a car ahead."

Phase 3: Distance Estimation

Uses known object sizes to estimate distance:

distance = (real_height × focal_length) / pixel_height

Known object sizes (in meters):

Object Height
Person 1.7m
Car 1.5m
Chair 0.9m
Bottle 0.25m

Phase 4: Indoor Navigation

Detects:

  • Doors (vertical lines)
  • Stairs (horizontal line patterns)
  • Walls (color detection)

Provides navigation hints like "Door on your left" or "Watch your step!"

Phase 5: Custom Model Training

  1. Collect images of Indian road environments:

    • Autorickshaws, handcarts, potholes, cattle
    • Street vendors, speed breakers, manholes
  2. Run setup:

    python src/train_model.py setup
  3. Annotate images with LabelImg or CVAT

  4. Train model:

    python src/train_model.py train

Phase 6: Multi-language TTS

Supported languages:

  • English (en) - Default
  • Hindi (hi) - "नमस्ते! मैं आपका दृष्टि सहायक हूं"
  • Telugu (te) - "నమస్కారం! నేను మీ విజువల్ అసిస్టెంట్"

Switch with: "Hey Assistant, change language to Hindi"

Phase 7: Adaptive Learning

Tracks user behavior to prioritize important objects:

  • Records which objects user asks about
  • Tracks positive/negative reactions
  • Adjusts detection priority based on user preferences

Data stored in data/user_feedback.json


Configuration

Edit src/config.py:

CAMERA_INDEX = 0              # Camera device
MODEL_PATH = 'models/yolov8n.pt'
CONFIDENCE = 0.4              # Detection threshold
WAKE_WORD = "hey assistant"   # Wake phrase
SPEECH_RATE = 160             # Words per minute
OLLAMA_MODEL = "llava"        # AI model

Dependencies

ultralytics>=8.0.0           # YOLO
opencv-python>=4.7.0         # Camera
pyttsx3>=2.90               # Offline TTS
SpeechRecognition>=3.8       # STT
vosk>=0.3.0                 # Offline STT
ollama>=0.1.0               # LLM
elevenlabs>=1.0.0           # Premium TTS (optional)
python-dotenv>=1.0.0        # Config
numpy>=1.23.0               # Math

Hardware Requirements

Component Purpose
Raspberry Pi 4 (4GB) Main processor
USB/Webcam Camera
Microphone Voice input
Earbuds/Speaker Audio output

Troubleshooting

Issue Solution
Wake word not detected Speak louder, reduce background noise
Ollama slow Use smaller model or disable
TTS not working Run espeak "test" to verify
Camera not found Check cv2.VideoCapture(0) index

VGS Companion - All 7 Phases Complete

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors