An AI agent that browses websites, extracts job listings and company leads, and exports structured results — all driven by the Model Context Protocol (MCP).
Built with the OpenAI Agents SDK, Playwright, and Gradio.
Give LeadScout a target website and a description of what you want — it will:
- Navigate the site using a real browser (Playwright handles JavaScript, pagination, dynamic content)
- Extract job listings, company details, contacts, or other leads
- Save every find to a structured vault as it browses
- Export the full list to CSV or JSON in one click
Results appear in a live Gradio UI alongside a real-time log of every MCP tool call the agent makes.
The Model Context Protocol (MCP) is an open standard that defines how AI applications connect to external tools and data sources. Think of it as USB-C for AI — a single, consistent interface that lets any MCP-compatible AI host talk to any MCP-compatible server, regardless of who built either one.
Before MCP, every AI application had to build its own integrations: custom code to call a browser, custom code to read files, custom code to query a database. With MCP, those integrations are written once as MCP servers and can be reused by any AI host.
MCP defines three distinct roles:
┌─────────────────────────────────────────────────────┐
│ MCP HOST │
│ The AI application that controls the agent loop. │
│ Owns the LLM, decides which servers to connect, │
│ and manages the overall conversation. │
│ │
│ In LeadScout: AgentManager + OpenAI Agents SDK │
└──────────────────┬──────────────────────────────────┘
│ spawns & talks to
┌───────────┼───────────┐
│ │ │
┌──────▼──────┐ ┌──▼──────┐ ┌──▼──────────┐
│ MCP CLIENT │ │ MCP │ │ MCP │
│ (built in │ │ CLIENT │ │ CLIENT │
│ the SDK) │ │ │ │ │
│ │ │ │ │ │
│ ──────────▼─┤ ├────────▼─┤ ├──────────▼─┤
│ MCP SERVER │ │ MCP │ │ MCP │
│ Playwright │ │ Fetch │ │ ProspectV. │
└─────────────┘ └─────────┘ └─────────────┘
| Role | Responsibility | Example in LeadScout |
|---|---|---|
| Host | Runs the agent loop, connects to servers, sends tool results to the LLM | AgentManager using the OpenAI Agents SDK |
| Client | One connection to one server — manages the protocol session | Created automatically by the SDK per server |
| Server | Exposes tools, resources, or prompts to the host | Playwright, Fetch, ProspectVault |
The host and client usually live in the same process. The server can be local (subprocess) or remote (over the network).
MCP servers expose three types of capabilities:
Functions the LLM can call to take actions or retrieve data. The server declares them; the LLM decides when to use them; the host executes them and returns results.
LLM decides to call add_prospect(name="Acme Corp", type="company", ...)
→ Host sends the call to ProspectVault MCP server
→ Server writes to vault, returns {"status": "saved", "id": "..."}
→ Host sends the result back to the LLM
→ LLM continues reasoning
Read-only data the host can fetch at any time — like file system reads or database queries. Identified by URI (e.g. vault://prospects). The host, not the LLM, decides when to read them.
Reusable prompt templates defined by the server. Less common, mostly used in IDE-style integrations where the user selects a predefined workflow.
LeadScout uses tools (for actions like add_prospect, browser_navigate) and resources (for vault reads like vault://stats).
MCP does not mandate a wire format — it defines a protocol, and that protocol can run over different transports. The two main options are stdio and SSE.
The host spawns the server as a child process. Messages travel over the process's stdin/stdout as newline-delimited JSON. The server lives and dies with the host process.
Host process
│
├── spawn: python prospect_vault_server.py
│ stdin ──────────────────────────► server reads requests
│ stdout ◄────────────────────────── server writes responses
│
├── spawn: npx @playwright/mcp
│ stdin/stdout ←──────────────────── same pattern
│
└── spawn: uvx mcp-server-fetch
stdin/stdout ←──────────────────── same pattern
When to use stdio:
- Local tools (browser automation, file system, databases on the same machine)
- Development and prototyping
- Tools that should not be exposed over the network
- Simple deployment — no server to run separately, no ports to manage
All three servers in LeadScout use stdio.
The server runs as a standalone HTTP service. The host connects over HTTP — commands go as POST requests, responses stream back as SSE events. The server runs independently and can serve multiple hosts simultaneously.
Host process Remote / separate process
│ │
│ POST /message ─────────────►│ MCP Server (HTTP)
│ GET /sse ◄─────────────│ streams responses
│ │
│ │ can serve many hosts at once
│ │ survives host restarts
When to use SSE:
- Shared infrastructure (one server instance used by many agents or users)
- Remote tools — the server is on a different machine or in the cloud
- Long-running services that should persist independently (e.g. a company-wide knowledge base server)
- Microservice architectures where tools are deployed separately
| stdio | SSE | |
|---|---|---|
| Deployment | Subprocess, same machine | Standalone HTTP server |
| Startup | On-demand, spawned by host | Always-on, host connects to it |
| Scope | One host at a time | Many hosts simultaneously |
| Network | None needed | HTTP/HTTPS |
| Security | OS process isolation | Network auth (API keys, OAuth) |
| Best for | Local tools, dev | Shared/remote tools, prod infra |
Ready-to-use servers published to npm or PyPI. You run them with npx or uvx — no installation, no code to write.
# Playwright browser automation
npx @playwright/mcp@latest
# Fetch / HTTP retrieval
uvx mcp-server-fetch
# Filesystem access
uvx mcp-server-filesystem /path/to/dir
# GitHub
uvx mcp-server-githubThe growing ecosystem means most common integrations (web search, databases, cloud services, dev tools) already have a prebuilt server. Browse the full list at modelcontextprotocol.io/servers.
Tradeoffs: Fast to get started, no maintenance burden. But you're limited to what the server exposes — no custom business logic.
You write and run your own MCP server. This is the right choice when you need:
- Custom business logic (e.g. "save a lead with our specific schema")
- Integration with proprietary internal systems
- Fine-grained control over what the LLM can and cannot do
- Domain-specific tools that don't exist in the ecosystem
The simplest way to build one in Python is FastMCP, which ships with the mcp[cli] package:
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("MyServer")
@mcp.tool()
def save_record(name: str, value: str) -> dict:
"""Save a record to the database."""
db.insert(name, value)
return {"status": "saved"}
@mcp.resource("data://records")
def list_records() -> str:
"""Return all records as JSON."""
return json.dumps(db.all())
if __name__ == "__main__":
mcp.run() # stdio by default; mcp.run(transport="sse") for HTTPThe @mcp.tool() decorator auto-generates the JSON schema from type hints and docstrings — the LLM sees a clean tool definition with no extra work.
ProspectVault (src/leadscout/servers/prospect_vault_server.py) is this project's custom server. It stores leads in a JSON file, supports filtering, and exports to CSV/JSON — business logic that doesn't exist in any prebuilt server.
When you start a search, here is the full flow:
1. User submits query in Gradio UI
2. AgentManager.initialize() (first run only)
└── Spawns three MCP servers as subprocesses (stdio)
└── Each server handshakes with the host: declares its tools and resources
└── SDK caches the tool list from each server
3. Runner.run(agent, message)
└── SDK builds a system prompt that includes all available MCP tools
└── LLM receives: instructions + tool definitions + user message
4. LLM decides to use a tool (e.g. browser_navigate)
└── SDK routes the call to the correct MCP server (Playwright)
└── Server executes the action (real browser navigates to URL)
└── Server returns result as text
└── Result is fed back to the LLM as a tool response
5. LLM continues — may call more tools, read resources, reason over results
6. LLM calls add_prospect() for each find
└── ProspectVault server saves to sandbox/prospects.json
└── Gradio UI polls the file and updates the Prospects tab live
7. LLM produces a final text summary → displayed in chat
The MCPActivityTracer intercepts every span emitted by the SDK (tool calls, list-tools handshakes, LLM generation) and streams them into the Live MCP Activity tab so you can watch the agent work in real time.
┌──────────────────────────────────────────────────────────────┐
│ Gradio UI (app.py) │
│ Search input + chat │ MCP Activity │ Prospects │ Servers │
└──────────┬───────────────────────────────────────────────────┘
│
AgentManager (agent_manager.py)
│ openai-agents SDK + AsyncExitStack
│
┌──────┼──────────────┬────────────────┐
│ │ │ │
┌───▼───┐ │ ┌────▼────┐ ┌──────▼──────┐
│Fetch │ │ │Playwright│ │ProspectVault│
│Server │ │ │ Server │ │ Server │
│(uvx) │ │ │ (npx) │ │ (python) │
└───────┘ │ └─────────┘ └─────────────┘
Prebuilt │ Prebuilt Custom
Fast page │ Full browser Stores & exports
fetching │ automation discovered leads
│
MCPActivityTracer (tracing.py)
Captures every tool call, LLM span → shown in UI
| Server | Type | Transport | What it provides |
|---|---|---|---|
| Playwright | Prebuilt (npx) |
stdio | Browser automation — navigate, click, scroll, snapshot, screenshot |
| Fetch | Prebuilt (uvx) |
stdio | Fast page content retrieval for simpler URLs |
| ProspectVault | Custom (Python) | stdio | Save leads, filter by type/location, export to CSV/JSON |
| Tool | Description |
|---|---|
add_prospect |
Save a found job or company lead with structured fields |
list_prospects |
List all saved prospects with optional type/location filters |
get_prospect |
Retrieve full detail of a single prospect |
update_notes |
Append notes to an existing prospect |
export_prospects |
Write all prospects to sandbox/ as CSV or JSON |
clear_vault |
Wipe all prospects to start a fresh session |
| Resource URI | Description |
|---|---|
vault://prospects |
All prospects as a JSON array |
vault://stats |
Counts by type and top locations |
mcp/
├── .env.example
├── .gitignore
├── pyproject.toml
├── README.md
├── sandbox/ # Agent outputs — gitignored
│ ├── prospects.json # Live vault data
│ └── prospects_YYYYMMDD_*.csv # Exports
└── src/
└── leadscout/
├── __main__.py
├── app.py # Gradio UI
├── agent_manager.py # Server lifecycle + agent runner
├── config.py # Server definitions + agent prompt
├── tracing.py # MCP activity capture
└── servers/
└── prospect_vault_server.py # Custom FastMCP server
| Requirement | Notes |
|---|---|
| Python 3.13+ | |
| UV | Package manager — curl -LsSf https://astral.sh/uv/install.sh | sh |
| Node.js 18+ | Required for Playwright MCP — nodejs.org |
| OpenAI API key | gpt-4o is the default model |
cd mcp
uv sync
cp .env.example .env
# Edit .env — set OPENAI_API_KEY=sk-...Playwright's browser binaries are downloaded automatically on first run by @playwright/mcp.
uv run leadscout
# or
uv run python -m leadscout- Enter a Target URL — e.g.
https://jobs.lever.co/anthropic - Describe what to look for — e.g.
Remote Python backend roles paying $120k+ - Select a Prospect type (job / company / contact / lead, or Any)
- Click ▶ Start Search
The agent will navigate the site, extract listings, and save each one to the vault. Watch the MCP Activity tab to see every browser action and vault save in real time.
After a search completes, use the Follow-up chat to refine:
"Filter to remote-only roles" "Which of these companies are Series A or earlier?" "Export everything you found to CSV"
Click Export CSV or Export JSON — the agent writes the file to sandbox/ and a download link appears in the UI.
| Prompt | What happens |
|---|---|
Browse YC's work-at-a-startup page and save all engineering roles |
Playwright navigates workatastartup.com, saves each role |
Find ML engineer openings at AI labs and note the salary ranges |
Browses company career pages, extracts compensation data |
Find B2B SaaS companies that raised in the last 6 months |
Fetches news/funding pages, saves company leads |
List all jobs you found and tell me which looks best for a senior engineer |
Uses ProspectVault list_prospects, agent summarises |
| Variable | Default | Description |
|---|---|---|
OPENAI_API_KEY |
(required) | Your OpenAI key |
OPENAI_MODEL |
gpt-4o |
Model for the agent |
SANDBOX_DIR |
./sandbox |
Where vault data and exports are written |
| Package | Purpose |
|---|---|
openai-agents |
Agent orchestration + MCP client |
mcp[cli] |
FastMCP server framework |
gradio |
Web UI |
python-dotenv |
Env var loading |