Optimize site-specific browser automation skills using Synth GEPA, Claude Code, and Kernel cloud browsers.
This demo shows how to use GEPA (Generative Evolutionary Prompt Adaptation) to iteratively improve browser automation skills that Claude Code discovers and uses. The result is a more reliable and faster browser agent.
GEPA is a prompt optimization algorithm that evolves prompts over multiple generations using LLM-guided mutations and selection. It's like genetic algorithms, but for prompts.
In this demo, GEPA optimizes a "skill file" that tells Claude Code how to automate LinkedIn:
- Navigation patterns and URLs
- Element selectors and interaction strategies
- Wait timing and error recovery
- Output formatting for answer extraction
┌─────────────────────────────────────────────────────────────────────────────┐
│ GEPA Optimization Loop │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ Synth │───►│ Task App │───►│ Kernel │ │
│ │ Backend │ │ (FastAPI) │ │ Browser │ │
│ │ │ │ │ │ Pool │ │
│ │ - Proposes │ │ - Writes │ │ │ │
│ │ skill │ │ skill to │ │ - Chrome │ │
│ │ mutations │ │ VM │ │ with CDP │ │
│ │ │ │ │ │ │ │
│ │ - Evaluates │ │ - Runs │ │ - agent- │ │
│ │ rollouts │ │ Claude │ │ browser │ │
│ │ │ │ Code │ │ │ │
│ │ - Selects │◄───│ │◄───│ - Claude │ │
│ │ best │ │ - LLM │ │ Code CLI │ │
│ │ variants │ │ verifies │ │ │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ Repeat for N generations until skill converges to optimal performance │
└─────────────────────────────────────────────────────────────────────────────┘
- Synth Backend proposes skill variations using LLM-guided mutations
- Task App receives rollout requests and orchestrates execution
- Kernel Browser Pool provides pre-authenticated VMs with Chrome + claude + agent-browser installed
- Claude Code reads the skill file and automates the browser
- LLM Verifier scores each attempt for correctness and efficiency
- GEPA selects the best-performing skill variants for the next generation
- Fully sandboxed - Uses Kernel's cloud browsers which are full Linux VMs
- Browser profiles - Pre-authenticated sessions (e.g., logged into LinkedIn)
- Parallel rollouts - Browser pool supports concurrent GEPA evaluations
- Efficiency rewards - Faster execution, fewer steps, and no arbitrary sleeps = higher reward
- Browser reuse - VMs are reused across rollouts, preserving network cache
- Real-time streaming - See Claude's tool calls and reasoning as they happen
- Python 3.13+
- uv for package management
- API keys for:
# Clone and enter the project
git clone git@github.com:kernel/browser-agent-gepa
cd browser-agent-gepa
uv syncCreate a .env file with your API keys:
SYNTH_API_KEY=sk_synth_user_...
KERNEL_API_KEY=sk_...
ANTHROPIC_API_KEY=sk-ant-api03-...# Using Kernel CLI: npm install -g @onkernel/cli
kernel profiles create --name linkedin
kernel browsers create --stealth --profile-name linkedin --save-changes
Then navigate to the live view URL, log in to LinkedIn. When you're finished:
kernel browsers delete <session id>
You now have a logged-in profile you can use across any number of browser sessions.
Before running, create a browser pool with your authenticated profile:
kernel browser-pools create \
--name agent-gepa \
--profile-name linkedin \
--stealth \
--size 10Run a single task to verify everything works:
# List available tasks
uv run python run_eval.py --list
# Run one task
uv run python run_eval.py --task own_follower_count
# Run all 3 tasks
uv run python run_eval.py --allThe Synth backend needs to reach your task app. The script automatically creates a SynthTunnel to expose your local task app:
# Default: Uses SynthTunnel automatically
uv run python run_gepa.py
# Override settings
uv run python run_gepa.py --generations 5 --budget 50
# Use local Synth backend (for development)
uv run python run_gepa.py --localWorkaround if SynthTunnel fails (see problems/synthtunnel.md):
# Use ngrok instead
ngrok http 8030 --url your-subdomain.ngrok-free.app
export TASK_APP_URL=https://your-subdomain.ngrok-free.app
uv run python run_gepa.pyThe optimized skill will be saved to output/optimized_skill_<timestamp>.md.
This demo includes 3 verifiable LinkedIn tasks:
| Task ID | Description | Expected Result |
|---|---|---|
own_follower_count |
Get my follower count | ~1218 (±5%) |
own_connection_count |
Get my connection count | ~653 (±5%) |
search_bill_gates_followers |
Find Bill Gates' follower count | ~39.8M (±5%) |
The reward function balances correctness with efficiency:
reward = correctness * (1.0 - time_penalty - step_penalty - sleep_penalty)Where:
correctness= 1.0 if LLM judge says correct, 0.0 otherwisetime_penalty= min(0.2, 0.1 × elapsed / max_time)step_penalty= min(0.2, 0.1 × steps / max_steps)sleep_penalty= min(0.15, 0.02 × num_sleeps + 0.01 × sleep_seconds)
The sleep penalty discourages arbitrary sleep or agent-browser wait commands, pushing the agent toward smarter waiting strategies (like waiting for specific elements).
Example rewards:
| Scenario | Reward |
|---|---|
| Correct, fast, no sleeps | ~0.90 |
| Correct, slow, no sleeps | ~0.75 |
| Correct, fast, 2s sleep | ~0.85 |
| Incorrect | 0.00 |
This encourages skills that are correct, efficient, and avoid arbitrary delays.
.
├── pyproject.toml # Python dependencies
├── .env # API keys (gitignored)
├── linkedin_gepa.toml # GEPA configuration
├── run_eval.py # Test tasks manually
├── run_gepa.py # Run GEPA optimization
├── skills/
│ └── linkedin.com/
│ └── SKILL.md # Initial skill (edit this!)
├── src/
│ └── linkedin_bench/
│ ├── __init__.py
│ ├── tasks.py # Task definitions
│ ├── skill_template.py # Skill loader (reads from skills/)
│ ├── verifier.py # LLM-based result verification
│ ├── kernel_runner.py # Kernel SDK wrapper
│ └── task_app.py # FastAPI task app for GEPA
└── output/ # Optimized skills saved here
The initial skill lives in skills/linkedin.com/SKILL.md so you can:
- Edit it directly as markdown
- Diff it against GEPA-optimized versions in
output/
This repo serves as a template. To optimize browser automation skills for a different site:
kernel profiles create --name mysite
kernel browsers create --profile-name mysite --save-changes
# Navigate to the live view URL, log in, then delete the browser
kernel browsers delete <session-id>kernel browser-pools create --name mysite-pool --profile-name mysite --size 5Create skills/mysite.com/SKILL.md with instructions for automating your site. Include:
- How to connect to Chrome (
agent-browser --cdp ws://127.0.0.1:9222) - Common navigation patterns and URLs
- How to find and extract data from the page
- Expected output format
Create a tasks module with Task definitions. Each task needs:
- A clear prompt describing what to do
- A natural language expectation the LLM judge can evaluate
- A reasonable timeout
from dataclasses import dataclass
@dataclass
class Task:
id: str
prompt: str
expected: str
timeout: int = 120
TASKS = [
Task(
id="get_account_balance",
prompt="Navigate to my account page and get my current balance.",
expected="A dollar amount (e.g., $1,234.56)",
timeout=120,
),
]- Rename/copy
src/linkedin_bench/to match your site (e.g.,src/mysite_bench/) - Update
DEFAULT_POOL_NAMEinkernel_runner.pyto your pool name - Update
skill_template.pyto point to your skill file path - Update
pyproject.tomlto reference your new package - Adjust GEPA settings in your TOML config as needed
Claude Code automatically discovers skill files at:
~/.claude/skills/<domain>/SKILL.md
The task app writes the skill content to this path on each rollout. Claude sees the skill in its context and follows its instructions.
Since Claude Code runs inside the Kernel VM (same machine as Chrome), we use CDP (Chrome DevTools Protocol) for direct browser control:
agent-browser --cdp http://127.0.0.1:9223 <command>Important: Use http://127.0.0.1:9223 (not localhost, ws://, or just a port number). The HTTP URL connects to the CDP endpoint on port 9223.
This is faster and more reliable than remote connections.
pool.acquire()- Get a browser session from the poolensure_ready()- Reset browser state and install tools if needed:- Clean up old skill files and Claude temp data
- Check if Claude Code and agent-browser are installed (skip if already present)
- Close extra tabs and navigate to new tab page
fs.write_file()- Write skill to correct pathprocess.spawn()- Run Claude Code with task prompt (streaming output)pool.release(reuse=true)- Return browser to pool, preserving VM state for next use
The verifier uses Claude to interpret natural language expectations:
TASK: Get my LinkedIn follower count
EXPECTED: Around 1218 followers (±5%)
AGENT OUTPUT: "ANSWER: 1,205"
# LLM evaluates: 1205 is within 5% of 1218 → correct=TrueThis is more flexible than exact matching and tolerates minor variations.
If you see TunnelError: ws connect failed: HTTP error: 404 Not Found, the SynthTunnel relay endpoint isn't responding. See problems/synthtunnel.md for details.
Workaround: Use ngrok instead:
ngrok http 8030 --url your-subdomain.ngrok-free.app
export TASK_APP_URL=https://your-subdomain.ngrok-free.app
uv run python run_gepa.pyThe Synth backend needs to reach your task app. If SynthTunnel isn't working and you haven't set TASK_APP_URL:
# Expose your task app via ngrok
ngrok http 8030
export TASK_APP_URL=https://xxxxx.ngrok.io
uv run python run_gepa.pyCreate the pool first:
kernel browser-pools create --name agent-gepa --profile-name linkedin --size 10The task app auto-installs Claude Code on first run. If it fails:
kernel ssh <session-id>
curl -fsSL https://claude.ai/install.sh | bash- Disable stealth mode for faster page loads:
kernel browser-pools update agent-gepa --stealth=false - Increase pool size for more parallelism
- Discard idle browsers after config changes:
kernel browser-pools update agent-gepa --discard-all-idle
MIT
Contributions welcome! Please open an issue first to discuss major changes.