Skip to content

ibtisamafzal/voyance

Repository files navigation

Voyance

AI-powered visual web research agent — speak a task, watch it navigate live sites with Gemini vision, get a spoken briefing and a comparison report.

Gemini Live Agent Challenge 2026
Track: UI Navigator · Visual UI understanding & interaction

Live Demo

See Voyance research, verify, and narrate in a real end-to-end flow.

Open Demo

Dev.to Blog

Read the architecture and implementation decisions behind Voyance.

Read Blog Post


Table of contents


What it does

Voyance turns natural language into competitive intelligence in minutes:

Step Description
1. You say What you need — e.g. "Compare pricing for the top 5 CRM tools"
2. The agent Plans, visits 3–5 live websites, and “reads” pages with Gemini multimodal vision (screenshots only — no DOM scraping)
3. You get A sortable comparison table, CSV/HTML export, and Vera (ElevenLabs) reading the briefing aloud

No DOM hacks, no site-specific APIs. Works across site redesigns. Backend is configured for deployment on Google Cloud Run.

Features

  • Natural language input — Describe your research task in plain English (e.g. compare pricing, features, or reviews).
  • Multi-site research — Agent visits 3–5 live websites per task with no DOM scraping or site-specific APIs.
  • Gemini vision — Screenshot-based page understanding; works across redesigns and any site.
  • Comparison table — Sortable results with company, segment, pricing, and key details.
  • Export — Download results as CSV or HTML.
  • Spoken briefing (Vera) — ElevenLabs TTS reads the summary aloud.
  • Interrupt + replan — During a run, you can submit a redirect instruction (text/voice); the agent queues it and replans on the next loop iteration.
  • Fact verification — Perplexity-backed claim checks where relevant.

Screenshots

Hero — Enter your research query and start the agent.

Hero section

Output — Comparison table, CSV/HTML export, and Listen to Vera.

Output section

Hackathon alignment

Requirement Voyance
Gemini model Gemini 2.0 Flash (planning, screenshot analysis, synthesis)
Google GenAI SDK / ADK Google GenAI SDK (google-generativeai): Gemini for planning, vision, synthesis. Custom agent loop (plan → navigate → extract → verify), not the ADK library.
Google Cloud service Backend deployment target is Google Cloud Run (infra/cloudbuild.yaml, infra/main.tf)
UI Navigator Screenshots analyzed by Gemini vision; agent outputs navigation and extraction actions

Third-party: ElevenLabs (Vera TTS), Firecrawl (extraction), Perplexity (fact verification).

Google Cloud Deployment

Google Cloud Run logs proof


Quick start

Prerequisites

1. Clone and install

git clone https://github.com/ibtisamafzal/voyance.git
cd voyance
npm install

2. Backend

cd backend
pip install -r requirements.txt
playwright install chromium
cp .env.example .env
# Edit .env with your API keys
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
Service URL
Backend http://localhost:8000
API docs http://localhost:8000/api/docs

3. Frontend

From the repo root (new terminal):

npm run dev

Frontend: http://localhost:5173

4. Run a research task

  1. Enter a query in the hero (e.g. "Compare pricing for top 5 CRM tools").
  2. Click Research — the agent plans, navigates, extracts, and verifies.
  3. In the Output section: sort the table, export CSV or HTML, and click Listen to Vera for the spoken briefing.

Tech stack

Layer Technology
AI & vision Gemini 2.0 Flash
Browser Playwright (headless Chromium), screenshot-based only
Extraction Firecrawl API → Gemini vision fallback
Verification Perplexity API
Voice ElevenLabs TTS (Vera)
Backend FastAPI, WebSockets; Google Cloud Run deployment target
Frontend React, Vite, Tailwind
Infra Docker, Cloud Build, Terraform (infra/)

Architecture

User and frontend → backend (Cloud Run target) → Gemini, Playwright, Firecrawl, Perplexity, ElevenLabs.

Voyance architecture


Voyance mind map

From idea to implementation at a glance.

This mind map captures the core of Voyance for the Gemini Live Agent Challenge — from the problem and solution, through key features and technical stack, to user personas and submission requirements.

Voyance mind map for Gemini Live Agent Challenge


Environment variables

Copy backend/.env.example to backend/.env and set:

Variable Purpose
GEMINI_API_KEY Google AI Studio
ELEVENLABS_API_KEY Vera TTS
FIRECRAWL_API_KEY Fast extraction
PERPLEXITY_API_KEY Fact verification
GOOGLE_CLOUD_PROJECT Optional (Firestore); in-memory fallback if unset
CONTACT_EMAIL Contact form recipient email (set in server env)
CONTACT_EMAIL_APP_PASSWORD Gmail App Password for SMTP contact form sending

Deployment

  • Backend: Google Cloud Run. Deploy with infra/cloudbuild.yaml from repo root:

    gcloud builds submit --config=infra/cloudbuild.yaml .

    Default: 1 GiB memory, 1 CPU (increase to 2 GiB if needed for Playwright).

  • Frontend: Host on Vercel or any static host; set VITE_API_URL to your Cloud Run URL (no trailing slash).

Troubleshooting: Stuck on "Connecting…" → set VITE_API_URL on your host. WebSocket 403 → ensure no trailing slash in VITE_API_URL. OOM → increase memory in cloudbuild.yaml.


Project structure

├── src/app/              # React frontend
│   ├── components/       # HeroSection, ResearchOutputSection, Navbar, etc.
│   └── context/          # ResearchContext (shared state)
├── backend/              # FastAPI backend
│   ├── app/
│   │   ├── agent.py      # Research loop (plan → navigate → extract → verify)
│   │   ├── routers/      # Research, voice, health, sessions
│   │   └── services/     # Gemini, Firecrawl, Perplexity, Playwright, ElevenLabs
│   └── main.py
└── infra/                # GCP automation
    ├── cloudbuild.yaml   # Build & deploy to Cloud Run
    └── main.tf           # Terraform

Community & write-ups

Contact

Contact Use the in-app contact form (/contact)
LinkedIn linkedin.com/in/ibtisamafzal

Blog: How We Built Voyance (DEV) · Hackathon: Gemini Live Agent Challenge (see Devpost for current schedule)


License

MIT

About

AI visual web research agent — natural language → Gemini vision navigates live sites → spoken briefing + comparison table. UI Navigator @ Gemini Live Agent Challenge.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors