Skip to content

proceane/sylva-navigator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌱 SYLVA-navigator

Smart Yield-Logic Vision Agent (I created this project for the purposes of entering the Google Gemini Live Agent Challenge hackathon)

Demo Video

SYLVA-navigator Demo

Overview

SYLVA is an autonomous interactive agent developed for the Gemini Live Agent Challenge. Departing from traditional DOM-based automation, SYLVA leverages Gemini's multimodal vision to understand screen layouts and infer user intent via its proprietary Yield-Logic. This system acts as a high-precision guide for complex enterprise interfaces, balancing technical reliability with innovative user experience.

Project Category

UI Navigator: Autonomous navigation and user guidance through visual analysis.

Key Features

  • Yield-Logic: An advanced algorithm that evaluates the functional value (yield) of UI elements based on visual pixel data.
  • Real-time Intent Inference: Predictive analysis of cursor velocity and dwell time to anticipate user goals.
  • Enterprise-grade Security: Secure data paths between local environments and Google Cloud via ADC and Service Account roles.
  • Visual Reasoning Log: Real-time visualization of Gemini 3 Flash's thought process rendered directly onto the target browser.

Tech Stack

  • AI Model: Gemini 2.5 Flash Lite (Vertex AI)
  • Backend: FastAPI (Python 3.11+)
  • UI: Integrated Driver Overlay (HTML/CSS/JS)
  • Automation: Playwright
  • Infrastructure: Google Cloud Run, ADC

🛠 Setup and Installation

1. Prerequisites

  • Python 3.11 or higher
  • Google Cloud SDK installed and configured.
  • Chrome/Chromium (Managed via Playwright).
  • Google Cloud Project with Vertex AI API enabled.

2. Environment Variables

SYLVA requires specific environment variables for cloud integration. Refer to .env.example for the template.

Bridge (Cloud):

  • GCP_PROJECT_ID: Your Google Cloud Project ID.
  • GCP_LOCATION: region (e.g., us-central1).

Driver (Local):

  • BRIDGE_URL: The URL of your deployed Cloud Run service.
# Example setup
cp .env.example .env
# Edit .env with your specific project details

3. Installation

Bridge (Cloud Core):

cd bridge
pip install -r requirements.txt
cd ..

Driver (Local Sentinel):

cd driver
pip install -r requirements.txt
playwright install chromium
cd ..

🚀 How to Run

1. Deploy the Bridge (Cloud)

Ensure you are authenticated with Application Default Credentials (ADC):

gcloud auth application-default login
chmod +x deploy.sh
./deploy.sh

2. Run the Driver (Local)

Once the bridge is deployed, update the BRIDGE_URL in your .env and run:

python driver/main.py --headed

🎯 Sample Input & Test Scenarios

To verify SYLVA's autonomous navigation, you can use the following test cases:

  1. Wikipedia Knowledge Fetch: "Search for 'Gemini (chatbot)' on Wikipedia and find the 'History' section."
    • Initial URL: https://www.wikipedia.org
    • To change the agent's response language, modify the AGENT_LANGUAGE variable in the .env file.
  2. Google Maps Location Search: "Search for 'Googleplex' on Google Maps and find the 'Directions' button."
    • Initial URL: https://www.google.com/maps

📑 Documentation

About

A vision-first autonomous web agent powered by Gemini 3 Flash, navigating complex UIs through pixel-based reasoning and functional Yield-Logic.

Topics

Resources

Stars

Watchers

Forks

Contributors