Voyance

AI-powered visual web research agent — speak a task, watch it navigate live sites with Gemini vision, get a spoken briefing and a comparison report.

Track: UI Navigator · Visual UI understanding & interaction

Live Demo

See Voyance research, verify, and narrate in a real end-to-end flow.

Open Demo

Dev.to Blog

Read the architecture and implementation decisions behind Voyance.

Read Blog Post

What it does

Voyance turns natural language into competitive intelligence in minutes:

Step	Description
1. You say	What you need — e.g. "Compare pricing for the top 5 CRM tools"
2. The agent	Plans, visits 3–5 live websites, and “reads” pages with Gemini multimodal vision (screenshots only — no DOM scraping)
3. You get	A sortable comparison table, CSV/HTML export, and Vera (ElevenLabs) reading the briefing aloud

No DOM hacks, no site-specific APIs. Works across site redesigns. Backend is configured for deployment on Google Cloud Run.

Features

Natural language input — Describe your research task in plain English (e.g. compare pricing, features, or reviews).
Multi-site research — Agent visits 3–5 live websites per task with no DOM scraping or site-specific APIs.
Gemini vision — Screenshot-based page understanding; works across redesigns and any site.
Comparison table — Sortable results with company, segment, pricing, and key details.
Export — Download results as CSV or HTML.
Spoken briefing (Vera) — ElevenLabs TTS reads the summary aloud.
Interrupt + replan — During a run, you can submit a redirect instruction (text/voice); the agent queues it and replans on the next loop iteration.
Fact verification — Perplexity-backed claim checks where relevant.

Screenshots

Hero — Enter your research query and start the agent.

Output — Comparison table, CSV/HTML export, and Listen to Vera.

Hackathon alignment

Requirement	Voyance
Gemini model	Gemini 2.0 Flash (planning, screenshot analysis, synthesis)
Google GenAI SDK / ADK	Google GenAI SDK (`google-generativeai`): Gemini for planning, vision, synthesis. Custom agent loop (plan → navigate → extract → verify), not the ADK library.
Google Cloud service	Backend deployment target is Google Cloud Run (`infra/cloudbuild.yaml`, `infra/main.tf`)
UI Navigator	Screenshots analyzed by Gemini vision; agent outputs navigation and extraction actions

Third-party: ElevenLabs (Vera TTS), Firecrawl (extraction), Perplexity (fact verification).

Google Cloud Deployment

Live backend URL: voyance-backend-712979751443.us-central1.run.app
Judge artifact: Google-Cloud-Logs-Voyance.png (Cloud Run logs screenshot)

Quick start

Prerequisites

Node.js 18+
Python 3.10+
API keys: Google AI Studio (Gemini), ElevenLabs, Firecrawl, Perplexity — see backend/.env.example

1. Clone and install

git clone https://github.com/ibtisamafzal/voyance.git
cd voyance
npm install

2. Backend

cd backend
pip install -r requirements.txt
playwright install chromium
cp .env.example .env
# Edit .env with your API keys
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Service	URL
Backend	http://localhost:8000
API docs	http://localhost:8000/api/docs

3. Frontend

From the repo root (new terminal):

npm run dev

Frontend: http://localhost:5173

4. Run a research task

Enter a query in the hero (e.g. "Compare pricing for top 5 CRM tools").
Click Research — the agent plans, navigates, extracts, and verifies.
In the Output section: sort the table, export CSV or HTML, and click Listen to Vera for the spoken briefing.

Tech stack

Layer	Technology
AI & vision	Gemini 2.0 Flash
Browser	Playwright (headless Chromium), screenshot-based only
Extraction	Firecrawl API → Gemini vision fallback
Verification	Perplexity API
Voice	ElevenLabs TTS (Vera)
Backend	FastAPI, WebSockets; Google Cloud Run deployment target
Frontend	React, Vite, Tailwind
Infra	Docker, Cloud Build, Terraform (`infra/`)

Architecture

User and frontend → backend (Cloud Run target) → Gemini, Playwright, Firecrawl, Perplexity, ElevenLabs.

Voyance mind map

From idea to implementation at a glance.

This mind map captures the core of Voyance for the Gemini Live Agent Challenge — from the problem and solution, through key features and technical stack, to user personas and submission requirements.

Environment variables

Copy backend/.env.example to backend/.env and set:

Variable	Purpose
`GEMINI_API_KEY`	Google AI Studio
`ELEVENLABS_API_KEY`	Vera TTS
`FIRECRAWL_API_KEY`	Fast extraction
`PERPLEXITY_API_KEY`	Fact verification
`GOOGLE_CLOUD_PROJECT`	Optional (Firestore); in-memory fallback if unset
`CONTACT_EMAIL`	Contact form recipient email (set in server env)
`CONTACT_EMAIL_APP_PASSWORD`	Gmail App Password for SMTP contact form sending

Deployment

Backend: Google Cloud Run. Deploy with infra/cloudbuild.yaml from repo root:
```
gcloud builds submit --config=infra/cloudbuild.yaml .
```
Default: 1 GiB memory, 1 CPU (increase to 2 GiB if needed for Playwright).
Frontend: Host on Vercel or any static host; set VITE_API_URL to your Cloud Run URL (no trailing slash).

Troubleshooting: Stuck on "Connecting…" → set VITE_API_URL on your host. WebSocket 403 → ensure no trailing slash in VITE_API_URL. OOM → increase memory in cloudbuild.yaml.

Project structure

├── src/app/              # React frontend
│   ├── components/       # HeroSection, ResearchOutputSection, Navbar, etc.
│   └── context/          # ResearchContext (shared state)
├── backend/              # FastAPI backend
│   ├── app/
│   │   ├── agent.py      # Research loop (plan → navigate → extract → verify)
│   │   ├── routers/      # Research, voice, health, sessions
│   │   └── services/     # Gemini, Firecrawl, Perplexity, Playwright, ElevenLabs
│   └── main.py
└── infra/                # GCP automation
    ├── cloudbuild.yaml   # Build & deploy to Cloud Run
    └── main.tf           # Terraform

Community & write-ups

Deep-dive blog: How We Built Voyance (DEV.to)
Reddit build log: How we built Voyance — an AI agent that researches the web by “seeing” it
Hackathon submission: Gemini Live Agent Challenge — UI Navigator track
Source code: Voyance on GitHub
GDG profile: g.dev/IbtisamAfzal

Contact


Contact	Use the in-app contact form (`/contact`)
LinkedIn	linkedin.com/in/ibtisamafzal

Blog: How We Built Voyance (DEV) · Hackathon: Gemini Live Agent Challenge (see Devpost for current schedule)

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 132 Commits
backend		backend
infra		infra
public		public
src		src
.env.example		.env.example
.gitignore		.gitignore
Architecture diagram.png		Architecture diagram.png
Cloud-Deployment-Proof.mp4		Cloud-Deployment-Proof.mp4
Google-Cloud-Logs-Voyance.png		Google-Cloud-Logs-Voyance.png
Google-Developers-Profile.png		Google-Developers-Profile.png
README.md		README.md
SMOKE_CHECKLIST.md		SMOKE_CHECKLIST.md
index.html		index.html
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
vercel.json		vercel.json
vite.config.ts		vite.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voyance

Live Demo

Dev.to Blog

Table of contents

What it does

Features

Screenshots

Hackathon alignment

Google Cloud Deployment

Quick start

Prerequisites

1. Clone and install

2. Backend

3. Frontend

4. Run a research task

Tech stack

Architecture

Voyance mind map

Environment variables

Deployment

Project structure

Community & write-ups

Contact

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voyance

Live Demo

Dev.to Blog

Table of contents

What it does

Features

Screenshots

Hackathon alignment

Google Cloud Deployment

Quick start

Prerequisites

1. Clone and install

2. Backend

3. Frontend

4. Run a research task

Tech stack

Architecture

Voyance mind map

Environment variables

Deployment

Project structure

Community & write-ups

Contact

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages