Say "hey Friday" and watch AI browse the web for you.
Friday is a hands-free, wake-word-activated voice assistant that controls a real browser using Google Gemini's multimodal vision and planning capabilities. Just speak a task β Friday navigates, clicks, fills forms, and reports back when done.
Built for the Google Gemini Hackathon.
- ποΈ Wake-word activation β always listening for "hey Friday", completely hands-free
- π§ Gemini Vision + Planning β Gemini sees the live browser screen and plans multi-step tasks
- π₯οΈ Floating HUD β persistent GUI showing live agent status and action log
- π Voice feedback β Friday speaks its status back to you in real time
- βΉοΈ Cancellable tasks β stop mid-execution with a single button press
- π Mute toggle β silence voice output without interrupting the agent
- β¨οΈ Browser shortcuts β back, refresh, new tab, home β all from the GUI
- π Loop detection β automatically recovers from stuck states
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Main Thread (GUI) β
β Tkinter HUD β
β status Β· log Β· mic Β· stop Β· mute β
βββββββββββββββββββββ¬ββββββββββββββββββββββββββββββββββ
β callbacks + threading.Event
βββββββββββββββββββββΌββββββββββββββββββββββββββββββββββ
β Background Thread (Agent Loop) β
β β
β VoiceListener βββΊ wake word βββΊ record command β
β β β
β asyncio event loop β
β β β
β browser-use Agent β
β β β
β Google Gemini API β
β (Vision + Planning) β
β β β
β Playwright Browser β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Two threads run in parallel:
- Main thread β Tkinter GUI, never blocks
- Agent thread β owns its own
asyncioevent loop, handles voice I/O and all browser automation
Tasks are wrapped as asyncio.Task objects so they can be cancelled cleanly at any point via loop.call_soon_threadsafe(task.cancel).
| Technology | Role |
|---|---|
Gemini API (gemini-2.5-flash) |
Core LLM β reasoning, planning, and decision-making |
| Gemini Vision | Reads and understands the live browser screen |
| Gemini Planning | Decomposes complex tasks into multi-step execution plans |
ChatGoogle (browser_use.llm.google) |
Native Google Gemini API integration used in agent.py |
| Google AI Studio | API key management and usage monitoring |
| Technology | Role |
|---|---|
browser-use |
Browser agent framework |
Playwright |
Browser engine (via browser-use) |
Vosk |
Offline wake-word detection + speech recognition |
SpeechRecognition |
Microphone input |
pyttsx3 |
Text-to-speech (Friday's voice) |
Tkinter |
Floating HUD / GUI |
pyautogui |
Browser keyboard shortcut passthrough |
python-dotenv |
API key management |
Python 3.11+ |
Language |
- Python 3.11 or higher
- A Gemini API key from Google AI Studio
- A working microphone
# 1. Clone the repo
git clone https://github.com/yourusername/friday.git
cd friday
# 2. Create and activate a virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
# 3. Install dependencies
pip install -r requirements.txt
# 4. Install Playwright browsers
playwright install chromium
# 5. Set up your API key
cp .env.example .env
# Edit .env and add your GEMINI_API_KEYCreate a .env file in the project root:
GEMINI_API_KEY=your_gemini_api_key_hereGet your key from Google AI Studio β it's free.
python main.pyFriday will calibrate your microphone on startup, then say "Friday is ready."
- Say "hey Friday" to activate
- Wait for the "Listening" confirmation
- Speak your task clearly:
- "Search for flights from New York to London next month"
- "Go to YouTube and find a tutorial on Python asyncio"
- "Open Gmail and summarize my latest unread emails"
- Friday will confirm your command and start executing
- Say or press Stop to cancel at any time
| Button | Action |
|---|---|
| ποΈ Mic | Manually trigger listening (no wake word needed) |
| βΉοΈ Stop | Cancel the current task immediately |
| π Mute | Toggle voice feedback on/off |
| β Back | Browser back (only when no task running) |
| β» Refresh | Browser refresh (only when no task running) |
| + New Tab | Open a new browser tab |
| π Home | Navigate to browser home |
| β Quit | Exit Friday |
friday/
βββ main.py # Entry point β orchestrates all components
βββ agent.py # Google Gemini LLM setup via ChatGoogle (gemini-2.5-flash)
βββ browser.py # Browser profile configuration for browser-use
βββ voice.py # Wake-word detection, recording, TTS
βββ gui.py # Tkinter floating HUD
βββ .env # Your API keys (never commit this)
βββ .env.example # Template for environment variables
βββ requirements.txt # Python dependencies
Key parameters in main.py you can tune:
agent = Agent(
use_vision=True, # Gemini reads the screen visually
enable_planning=True, # Multi-step task decomposition
max_failures=5, # Retries before giving up
loop_detection_enabled=True, # Detects and escapes stuck loops
)No microphone? No problem. Friday has a manual trigger button in the GUI so you can test every feature without ever saying a word.
# 1. Clone and enter the repo
git clone https://github.com/yourusername/friday.git
cd friday
# 2. Create a virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux
venv\Scripts\activate # Windows
# 3. Install all dependencies
pip install -r requirements.txt
# 4. Install the browser engine
playwright install chromium
# 5. Add your Gemini API key
echo "GEMINI_API_KEY=your_key_here" > .envGet a free Gemini API key at aistudio.google.com β takes 30 seconds.
python main.pyWait for the floating HUD to appear and Friday to say "Friday is ready."
Option A β Voice (recommended) Say "hey Friday" clearly, wait for "Listening", then speak your task.
Option B β Manual button (no microphone needed) Click the ποΈ mic button in the HUD, then speak your task. Identical behavior, no wake word required.
Try these in order β they go from simple to complex and cover the full range of Friday's capabilities:
1. Sanity check
"Go to google.com"
The browser should open and navigate to Google. Friday confirms when done.
2. Google Search + Gemini vision
"Search Google for the latest Gemini AI news and tell me the top result"
Friday searches, reads the results page visually using Gemini, and speaks the top headline back to you.
3. Google News summarization
"Open Google News and summarize the top 3 stories right now"
Friday navigates to news.google.com, reads the live headlines using Gemini vision, and gives you a spoken summary.
4. YouTube navigation
"Go to YouTube and find the most viewed video about Google Gemini"
Multi-step task β Friday searches, reads view counts visually, and opens the top result.
5. The fun one β Rickroll
"Hey Friday, rickroll me"
Friday interprets intent (not just literal words), navigates to YouTube, finds Rick Astley's Never Gonna Give You Up, and plays it. Tests Gemini's reasoning ability.
6. Multi-step Google task
"Open Google Maps and find the highest rated coffee shop near me"
Tests location-aware browsing, reading ratings and reviews, and multi-step decision-making β all in one command.
- Give Friday a long task: "Search Google for every country in the world and list them all"
- While it's running, press βΉοΈ Stop in the HUD
- Friday should say "Stopped." and return to idle within 2 seconds β no crash, no hang
| Signal | What it means |
|---|---|
| HUD status changes in real time | Agent loop and GUI are communicating correctly |
| Browser navigates without manual input | browser-use + Playwright working |
| Friday speaks results back | Gemini vision successfully read the page |
| Stop button cancels cleanly | Async task cancellation working |
| Log panel fills with action steps | Gemini planning is decomposing tasks correctly |
Friday doesn't hear the wake word
- Run with a quiet background β Vosk calibrates to ambient noise on startup
- Speak clearly and at a normal pace
- Check your microphone is set as the default input device
Browser doesn't open
- Make sure you ran
playwright install chromium - Check your
browser.pyprofile configuration
Gemini API errors
- Verify your
GEMINI_API_KEYin.envis valid and has quota - Check Google AI Studio for usage limits
Task gets stuck
- Press the Stop button in the GUI β this cleanly cancels the async task
loop_detection_enabled=Truewill also auto-recover in most cases
Friday's entire AI brain is powered by a direct call to Google's Gemini API via the official ChatGoogle client from browser-use:
# agent.py
from browser_use.llm.google.chat import ChatGoogle
def create_llm(api_key: str) -> ChatGoogle:
"""
Creates a Google Gemini LLM instance via the official Google API.
Model: gemini-2.5-flash β Google's latest multimodal model.
API key sourced from GEMINI_API_KEY environment variable.
Used by browser-use Agent for vision, planning, and task execution.
"""
return ChatGoogle(
model="gemini-2.5-flash",
api_key=api_key,
temperature=0.1,
)Every voice command Friday receives triggers a live call to gemini-2.5-flash β Google's latest model β for vision-based screen reading, multi-step task planning, and action execution. No other LLM provider is used anywhere in the codebase.
API usage is tracked and visible in Google AI Studio β 500+ requests, 5M+ input tokens, 100% success rate logged over 28 days of development.
Pull requests are welcome! For major changes, please open an issue first.
- Fork the repo
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
MIT License β see LICENSE for details.
- browser-use β the incredible browser agent framework that makes this possible
- Google Gemini β for vision and planning capabilities
- Vosk β for fast, offline speech recognition
Built with β€οΈ for the Google Gemini Hackathon