VGS Companion - Visual Guidance System

Full AI-Powered Assistance Through Phase 7

"Turning Vision Into Independence"

What's Built

VGS Companion is a complete visual assistance system with 7 phases of functionality:

Phase	Feature	Status
Phase 1	Camera + YOLO + Basic TTS	✅
Phase 2	Position Awareness (Left/Center/Right)	✅
Phase 3	Distance Estimation	✅
Phase 4	Indoor Navigation (Doors/Stairs)	✅
Phase 5	Custom YOLO Training Setup	✅
Phase 6	Multi-language TTS (EN/HI/TE)	✅
Phase 7	Adaptive Learning	✅

Quick Start

# Install dependencies
pip install -r requirements.txt

# Install Ollama for AI features (optional)
winget install Ollama.Ollama
ollama pull llava

# Run the assistant
python src/main.py

Voice Commands

Command	Action
"Hey" / "Hey Assistant"	Activate assistant
"What do you see?"	Describe surroundings with position
"Where is the [object]?"	Locate specific object
"How far is it?"	Get distance to nearest object
"Help"	List capabilities
"Language Hindi/English/Telugu"	Change language
"Stop" / "Bye"	Return to standby

Project Structure

vgs/
├── src/
│   ├── main.py              # Main companion loop
│   ├── camera.py            # Camera capture
│   ├── detector.py          # Basic YOLO detection
│   ├── enhanced_detector.py # Phase 2-7 detection
│   ├── speaker.py           # Multi-language TTS
│   ├── wake_word.py         # Wake word detection
│   ├── listener.py          # Speech recognition
│   ├── companion.py         # LLaVA integration
│   └── train_model.py       # Phase 5 training setup
├── models/
│   ├── yolov8n.pt          # YOLOv8 model
│   └── vosk/               # Offline speech model
├── data/
│   └── user_feedback.json  # Phase 7 learning data
├── requirements.txt
└── README.md

Phase Details

Phase 1: MVP

Camera capture with OpenCV
YOLOv8 object detection
Text-to-speech output
Fully offline operation

Phase 2: Position Awareness

Objects are described by their position:

Left: x < 35% of frame
Center: 35% - 65% of frame
Right: x > 65% of frame

Example: "I see a person on your left, and a car ahead."

Phase 3: Distance Estimation

Uses known object sizes to estimate distance:

distance = (real_height × focal_length) / pixel_height

Known object sizes (in meters):

Object	Height
Person	1.7m
Car	1.5m
Chair	0.9m
Bottle	0.25m

Phase 4: Indoor Navigation

Detects:

Doors (vertical lines)
Stairs (horizontal line patterns)
Walls (color detection)

Provides navigation hints like "Door on your left" or "Watch your step!"

Phase 5: Custom Model Training

Collect images of Indian road environments:
- Autorickshaws, handcarts, potholes, cattle
- Street vendors, speed breakers, manholes
Run setup:
```
python src/train_model.py setup
```
Annotate images with LabelImg or CVAT
Train model:
```
python src/train_model.py train
```

Phase 6: Multi-language TTS

Supported languages:

English (en) - Default
Hindi (hi) - "नमस्ते! मैं आपका दृष्टि सहायक हूं"
Telugu (te) - "నమస్కారం! నేను మీ విజువల్ అసిస్టెంట్"

Switch with: "Hey Assistant, change language to Hindi"

Phase 7: Adaptive Learning

Tracks user behavior to prioritize important objects:

Records which objects user asks about
Tracks positive/negative reactions
Adjusts detection priority based on user preferences

Data stored in data/user_feedback.json

Configuration

Edit src/config.py:

CAMERA_INDEX = 0              # Camera device
MODEL_PATH = 'models/yolov8n.pt'
CONFIDENCE = 0.4              # Detection threshold
WAKE_WORD = "hey assistant"   # Wake phrase
SPEECH_RATE = 160             # Words per minute
OLLAMA_MODEL = "llava"        # AI model

Dependencies

ultralytics>=8.0.0           # YOLO
opencv-python>=4.7.0         # Camera
pyttsx3>=2.90               # Offline TTS
SpeechRecognition>=3.8       # STT
vosk>=0.3.0                 # Offline STT
ollama>=0.1.0               # LLM
elevenlabs>=1.0.0           # Premium TTS (optional)
python-dotenv>=1.0.0        # Config
numpy>=1.23.0               # Math

Hardware Requirements

Component	Purpose
Raspberry Pi 4 (4GB)	Main processor
USB/Webcam	Camera
Microphone	Voice input
Earbuds/Speaker	Audio output

Troubleshooting

Issue	Solution
Wake word not detected	Speak louder, reduce background noise
Ollama slow	Use smaller model or disable
TTS not working	Run `espeak "test"` to verify
Camera not found	Check `cv2.VideoCapture(0)` index

VGS Companion - All 7 Phases Complete

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
audio/sounds		audio/sounds
data		data
models		models
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
find_mic.py		find_mic.py
requirements.txt		requirements.txt
setup.bat		setup.bat
test.wav		test.wav
test_device0.wav		test_device0.wav
test_device2.wav		test_device2.wav
test_devices.py		test_devices.py
test_mic.py		test_mic.py
test_mic_fixed.py		test_mic_fixed.py
test_record.py		test_record.py
test_speech_flow.py		test_speech_flow.py
test_vosk.py		test_vosk.py
test_vosk_file.py		test_vosk_file.py
yolov8n.pt		yolov8n.pt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VGS Companion - Visual Guidance System

Full AI-Powered Assistance Through Phase 7

What's Built

Quick Start

Voice Commands

Project Structure

Phase Details

Phase 1: MVP

Phase 2: Position Awareness

Phase 3: Distance Estimation

Phase 4: Indoor Navigation

Phase 5: Custom Model Training

Phase 6: Multi-language TTS

Phase 7: Adaptive Learning

Configuration

Dependencies

Hardware Requirements

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VGS Companion - Visual Guidance System

Full AI-Powered Assistance Through Phase 7

What's Built

Quick Start

Voice Commands

Project Structure

Phase Details

Phase 1: MVP

Phase 2: Position Awareness

Phase 3: Distance Estimation

Phase 4: Indoor Navigation

Phase 5: Custom Model Training

Phase 6: Multi-language TTS

Phase 7: Adaptive Learning

Configuration

Dependencies

Hardware Requirements

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages