Wheres_WALDO

Vision AI Interface

A modular, multi-camera AI visual interface designed for real-time object recognition, AI interaction, and future robotic control.

🧠 Project Overview

WALDO is a Python-based AI assistant interface with live video input from up to 4 USB cameras. Each feed is processed independently by an associated AI instance, while a central "interface AI" receives typed input from the user and intelligently routes context-aware prompts.

This system is designed for rapid prototyping of AI perception systems, with the end goal of integrating with physical robotics such as Le Robot (open-source robotic arm).

✅ Features Implemented So Far

Modular Project Structure with separate files for UI, camera, AI logic, and API integration.
4-Camera Feed Grid with on/off toggles and camera index overlay.
Interface GUI built using Tkinter:
- Live 2x2 camera feed display (20-25 FPS)
- Scrollable AI output log
- Resizable, expandable interface layout (via PanedWindow)
- Multiline input box with Shift+Enter for newline, Enter to send
API Integration using NRP (Nautilus Research Platform) API with llava-onevision model
Query Routing via ai_manager.py to link camera-specific prompts to appropriate AI instance
.bat Launcher for easy startup without CLI or terminal

🛠️ Configuration & Setup

🔧 Requirements

Python 3.10+
Virtual environment activated (venv)
Dependencies:

# Create and activate a virtual environment (if not done yet)
python -m venv venv
.\venv\Scripts\activate  # Windows

# Upgrade pip
pip install --upgrade pip

# Core dependencies
pip install scipy matplotlib pyserial pandas scikit-learn
pip install opencv-python opencv-contrib-python
pip install numpy
pip install pillow
pip install openai==1.30.1   # Match your API version & structure
pip install requests
pip install python-dotenv    # Optional, for storing API keys in a .env file
pip install pyserial
pip install torch torchvision torchaudio transformers
pip install ultralytics
pip install python-dotenv openai
pip install --upgrade openai

📁 Project Structure

project/
├── camera_viewer.py       # Handles video grid and feed logic
├── gui_interface.py       # User interface (Tkinter)
├── ai_manager.py          # Prompt routing & AI instance control
├── vision_ai.py           # NRP API integration
├── .env                   # Contains your NRP_API_KEY
├── run_interface.bat      # Windows launcher

🔐 .env File Format

NRP_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxx

Run the interface by double-clicking run_interface.bat

🔮 What’s Happening Next

🧠 Interface AI Enhancements

No longer requires phrases like "What do you see on camera..."
Interprets context (e.g., "Where is the red box?" → uses AI judgment to query and correlate feeds)
Maintains ongoing conversation context (multimodal grounding)

🧼 Camera Feed Cleanup

Exclude toggle buttons and number overlays from AI input images

🧊 Object Recognition Overlay

Display bounding boxes and labels on objects identified in live feeds
Include (x, y) pixel coordinates for object location

🧭 Spatial Mapping

Estimate 3D position (x, y, z) using triangulation across camera angles
Requires calibrated camera positioning configuration file

🤖 Robotic Arm Integration

Connect AI output to control Le Robot (open-source)
Allow AI to select and manipulate objects based on camera data and interface prompts
Step toward full perception-actuation loop

🔗 External Resources / Dependencies

NRP Nautilus API
OpenAI Python SDK (custom base_url support)
[Le Robot GitHub (TBD: insert link)]
Tkinter Docs
LLaVA (vision-language model)

📌 Contributions & Expansion

This system is modular and open to extension:

Add persistent memory per AI
Integrate voice recognition and TTS
Enable external plugin system for additional sensors or actuators
Allow in-GUI camera toggling, layout presets, and dark mode themes

Built by Peter Shryock for a future where AI can see, speak, and act.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wheres_WALDO

Vision AI Interface

🧠 Project Overview

✅ Features Implemented So Far

🛠️ Configuration & Setup

🔧 Requirements

📁 Project Structure

🔐 .env File Format

🔮 What’s Happening Next

🧠 Interface AI Enhancements

🧼 Camera Feed Cleanup

🧊 Object Recognition Overlay

🧭 Spatial Mapping

🤖 Robotic Arm Integration

🔗 External Resources / Dependencies

📌 Contributions & Expansion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
ai_manager.py		ai_manager.py
camera_viewer.py		camera_viewer.py
camera_viewer_OLD.py		camera_viewer_OLD.py
gui_interface.py		gui_interface.py
run_interface.bat		run_interface.bat
vision_ai.py		vision_ai.py

Folders and files

Latest commit

History

Repository files navigation

Wheres_WALDO

Vision AI Interface

🧠 Project Overview

✅ Features Implemented So Far

🛠️ Configuration & Setup

🔧 Requirements

📁 Project Structure

🔐 .env File Format

🔮 What’s Happening Next

🧠 Interface AI Enhancements

🧼 Camera Feed Cleanup

🧊 Object Recognition Overlay

🧭 Spatial Mapping

🤖 Robotic Arm Integration

🔗 External Resources / Dependencies

📌 Contributions & Expansion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages