Smart Yield-Logic Vision Agent (I created this project for the purposes of entering the Google Gemini Live Agent Challenge hackathon)
SYLVA is an autonomous interactive agent developed for the Gemini Live Agent Challenge. Departing from traditional DOM-based automation, SYLVA leverages Gemini's multimodal vision to understand screen layouts and infer user intent via its proprietary Yield-Logic. This system acts as a high-precision guide for complex enterprise interfaces, balancing technical reliability with innovative user experience.
UI Navigator: Autonomous navigation and user guidance through visual analysis.
- Yield-Logic: An advanced algorithm that evaluates the functional value (yield) of UI elements based on visual pixel data.
- Real-time Intent Inference: Predictive analysis of cursor velocity and dwell time to anticipate user goals.
- Enterprise-grade Security: Secure data paths between local environments and Google Cloud via ADC and Service Account roles.
- Visual Reasoning Log: Real-time visualization of Gemini 3 Flash's thought process rendered directly onto the target browser.
- AI Model: Gemini 2.5 Flash Lite (Vertex AI)
- Backend: FastAPI (Python 3.11+)
- UI: Integrated Driver Overlay (HTML/CSS/JS)
- Automation: Playwright
- Infrastructure: Google Cloud Run, ADC
- Python 3.11 or higher
- Google Cloud SDK installed and configured.
- Chrome/Chromium (Managed via Playwright).
- Google Cloud Project with Vertex AI API enabled.
SYLVA requires specific environment variables for cloud integration. Refer to .env.example for the template.
Bridge (Cloud):
GCP_PROJECT_ID: Your Google Cloud Project ID.GCP_LOCATION: region (e.g.,us-central1).
Driver (Local):
BRIDGE_URL: The URL of your deployed Cloud Run service.
# Example setup
cp .env.example .env
# Edit .env with your specific project detailsBridge (Cloud Core):
cd bridge
pip install -r requirements.txt
cd ..Driver (Local Sentinel):
cd driver
pip install -r requirements.txt
playwright install chromium
cd ..Ensure you are authenticated with Application Default Credentials (ADC):
gcloud auth application-default login
chmod +x deploy.sh
./deploy.shOnce the bridge is deployed, update the BRIDGE_URL in your .env and run:
python driver/main.py --headedTo verify SYLVA's autonomous navigation, you can use the following test cases:
- Wikipedia Knowledge Fetch: "Search for 'Gemini (chatbot)' on Wikipedia and find the 'History' section."
- Initial URL:
https://www.wikipedia.org - To change the agent's response language, modify the
AGENT_LANGUAGEvariable in the.envfile.
- Initial URL:
- Google Maps Location Search: "Search for 'Googleplex' on Google Maps and find the 'Directions' button."
- Initial URL:
https://www.google.com/maps
- Initial URL:
- GEMINI.md: Technical architecture and Yield-Logic details.
- Architecture Details: Visual system diagram.
- Medium Blog Post: Detailed development journey.