LeadScout

An AI agent that browses websites, extracts job listings and company leads, and exports structured results — all driven by the Model Context Protocol (MCP).

Built with the OpenAI Agents SDK, Playwright, and Gradio.

What it does

Give LeadScout a target website and a description of what you want — it will:

Navigate the site using a real browser (Playwright handles JavaScript, pagination, dynamic content)
Extract job listings, company details, contacts, or other leads
Save every find to a structured vault as it browses
Export the full list to CSV or JSON in one click

Results appear in a live Gradio UI alongside a real-time log of every MCP tool call the agent makes.

MCP Concepts

What is MCP?

The Model Context Protocol (MCP) is an open standard that defines how AI applications connect to external tools and data sources. Think of it as USB-C for AI — a single, consistent interface that lets any MCP-compatible AI host talk to any MCP-compatible server, regardless of who built either one.

Before MCP, every AI application had to build its own integrations: custom code to call a browser, custom code to read files, custom code to query a database. With MCP, those integrations are written once as MCP servers and can be reused by any AI host.

The Three-Layer Architecture

MCP defines three distinct roles:

┌─────────────────────────────────────────────────────┐
│                    MCP HOST                          │
│  The AI application that controls the agent loop.   │
│  Owns the LLM, decides which servers to connect,    │
│  and manages the overall conversation.              │
│                                                     │
│  In LeadScout: AgentManager + OpenAI Agents SDK     │
└──────────────────┬──────────────────────────────────┘
                   │  spawns & talks to
       ┌───────────┼───────────┐
       │           │           │
┌──────▼──────┐ ┌──▼──────┐ ┌──▼──────────┐
│  MCP CLIENT │ │  MCP    │ │   MCP       │
│  (built in  │ │ CLIENT  │ │  CLIENT     │
│   the SDK)  │ │         │ │             │
│             │ │         │ │             │
│  ──────────▼─┤ ├────────▼─┤ ├──────────▼─┤
│  MCP SERVER │ │  MCP    │ │   MCP       │
│  Playwright │ │  Fetch  │ │  ProspectV. │
└─────────────┘ └─────────┘ └─────────────┘

Role	Responsibility	Example in LeadScout
Host	Runs the agent loop, connects to servers, sends tool results to the LLM	`AgentManager` using the OpenAI Agents SDK
Client	One connection to one server — manages the protocol session	Created automatically by the SDK per server
Server	Exposes tools, resources, or prompts to the host	Playwright, Fetch, ProspectVault

The host and client usually live in the same process. The server can be local (subprocess) or remote (over the network).

MCP Primitives

MCP servers expose three types of capabilities:

Tools

Functions the LLM can call to take actions or retrieve data. The server declares them; the LLM decides when to use them; the host executes them and returns results.

LLM decides to call add_prospect(name="Acme Corp", type="company", ...)
  → Host sends the call to ProspectVault MCP server
  → Server writes to vault, returns {"status": "saved", "id": "..."}
  → Host sends the result back to the LLM
  → LLM continues reasoning

Resources

Read-only data the host can fetch at any time — like file system reads or database queries. Identified by URI (e.g. vault://prospects). The host, not the LLM, decides when to read them.

Prompts

Reusable prompt templates defined by the server. Less common, mostly used in IDE-style integrations where the user selects a predefined workflow.

LeadScout uses tools (for actions like add_prospect, browser_navigate) and resources (for vault reads like vault://stats).

Transports: stdio vs SSE

MCP does not mandate a wire format — it defines a protocol, and that protocol can run over different transports. The two main options are stdio and SSE.

stdio (Standard I/O)

The host spawns the server as a child process. Messages travel over the process's stdin/stdout as newline-delimited JSON. The server lives and dies with the host process.

Host process
│
├── spawn: python prospect_vault_server.py
│         stdin  ──────────────────────────► server reads requests
│         stdout ◄────────────────────────── server writes responses
│
├── spawn: npx @playwright/mcp
│         stdin/stdout ←──────────────────── same pattern
│
└── spawn: uvx mcp-server-fetch
          stdin/stdout ←──────────────────── same pattern

When to use stdio:

Local tools (browser automation, file system, databases on the same machine)
Development and prototyping
Tools that should not be exposed over the network
Simple deployment — no server to run separately, no ports to manage

All three servers in LeadScout use stdio.

SSE (Server-Sent Events)

The server runs as a standalone HTTP service. The host connects over HTTP — commands go as POST requests, responses stream back as SSE events. The server runs independently and can serve multiple hosts simultaneously.

Host process                    Remote / separate process
│                               │
│  POST /message  ─────────────►│  MCP Server (HTTP)
│  GET  /sse      ◄─────────────│  streams responses
│                               │
│                               │  can serve many hosts at once
│                               │  survives host restarts

When to use SSE:

Shared infrastructure (one server instance used by many agents or users)
Remote tools — the server is on a different machine or in the cloud
Long-running services that should persist independently (e.g. a company-wide knowledge base server)
Microservice architectures where tools are deployed separately

Comparison

	stdio	SSE
Deployment	Subprocess, same machine	Standalone HTTP server
Startup	On-demand, spawned by host	Always-on, host connects to it
Scope	One host at a time	Many hosts simultaneously
Network	None needed	HTTP/HTTPS
Security	OS process isolation	Network auth (API keys, OAuth)
Best for	Local tools, dev	Shared/remote tools, prod infra

MCP Server Types

1. Prebuilt servers (community ecosystem)

Ready-to-use servers published to npm or PyPI. You run them with npx or uvx — no installation, no code to write.

# Playwright browser automation
npx @playwright/mcp@latest

# Fetch / HTTP retrieval
uvx mcp-server-fetch

# Filesystem access
uvx mcp-server-filesystem /path/to/dir

# GitHub
uvx mcp-server-github

The growing ecosystem means most common integrations (web search, databases, cloud services, dev tools) already have a prebuilt server. Browse the full list at modelcontextprotocol.io/servers.

Tradeoffs: Fast to get started, no maintenance burden. But you're limited to what the server exposes — no custom business logic.

2. Self-hosted custom servers

You write and run your own MCP server. This is the right choice when you need:

Custom business logic (e.g. "save a lead with our specific schema")
Integration with proprietary internal systems
Fine-grained control over what the LLM can and cannot do
Domain-specific tools that don't exist in the ecosystem

The simplest way to build one in Python is FastMCP, which ships with the mcp[cli] package:

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("MyServer")

@mcp.tool()
def save_record(name: str, value: str) -> dict:
    """Save a record to the database."""
    db.insert(name, value)
    return {"status": "saved"}

@mcp.resource("data://records")
def list_records() -> str:
    """Return all records as JSON."""
    return json.dumps(db.all())

if __name__ == "__main__":
    mcp.run()  # stdio by default; mcp.run(transport="sse") for HTTP

The @mcp.tool() decorator auto-generates the JSON schema from type hints and docstrings — the LLM sees a clean tool definition with no extra work.

ProspectVault (src/leadscout/servers/prospect_vault_server.py) is this project's custom server. It stores leads in a JSON file, supports filtering, and exports to CSV/JSON — business logic that doesn't exist in any prebuilt server.

How the Agent Loop Uses MCP

When you start a search, here is the full flow:

1. User submits query in Gradio UI

2. AgentManager.initialize() (first run only)
   └── Spawns three MCP servers as subprocesses (stdio)
   └── Each server handshakes with the host: declares its tools and resources
   └── SDK caches the tool list from each server

3. Runner.run(agent, message)
   └── SDK builds a system prompt that includes all available MCP tools
   └── LLM receives: instructions + tool definitions + user message

4. LLM decides to use a tool (e.g. browser_navigate)
   └── SDK routes the call to the correct MCP server (Playwright)
   └── Server executes the action (real browser navigates to URL)
   └── Server returns result as text
   └── Result is fed back to the LLM as a tool response

5. LLM continues — may call more tools, read resources, reason over results

6. LLM calls add_prospect() for each find
   └── ProspectVault server saves to sandbox/prospects.json
   └── Gradio UI polls the file and updates the Prospects tab live

7. LLM produces a final text summary → displayed in chat

The MCPActivityTracer intercepts every span emitted by the SDK (tool calls, list-tools handshakes, LLM generation) and streams them into the Live MCP Activity tab so you can watch the agent work in real time.

Architecture

┌──────────────────────────────────────────────────────────────┐
│                     Gradio UI (app.py)                       │
│  Search input + chat  │  MCP Activity │ Prospects │ Servers  │
└──────────┬───────────────────────────────────────────────────┘
           │
    AgentManager (agent_manager.py)
           │  openai-agents SDK + AsyncExitStack
           │
    ┌──────┼──────────────┬────────────────┐
    │      │              │                │
┌───▼───┐  │         ┌────▼────┐   ┌──────▼──────┐
│Fetch  │  │         │Playwright│   │ProspectVault│
│Server │  │         │ Server  │   │  Server     │
│(uvx)  │  │         │ (npx)   │   │  (python)   │
└───────┘  │         └─────────┘   └─────────────┘
Prebuilt   │         Prebuilt      Custom
Fast page  │         Full browser  Stores & exports
fetching   │         automation    discovered leads
           │
    MCPActivityTracer (tracing.py)
    Captures every tool call, LLM span → shown in UI

MCP Servers

Server	Type	Transport	What it provides
Playwright	Prebuilt (`npx`)	stdio	Browser automation — navigate, click, scroll, snapshot, screenshot
Fetch	Prebuilt (`uvx`)	stdio	Fast page content retrieval for simpler URLs
ProspectVault	Custom (Python)	stdio	Save leads, filter by type/location, export to CSV/JSON

ProspectVault tools & resources

Tool	Description
`add_prospect`	Save a found job or company lead with structured fields
`list_prospects`	List all saved prospects with optional type/location filters
`get_prospect`	Retrieve full detail of a single prospect
`update_notes`	Append notes to an existing prospect
`export_prospects`	Write all prospects to `sandbox/` as CSV or JSON
`clear_vault`	Wipe all prospects to start a fresh session

Resource URI	Description
`vault://prospects`	All prospects as a JSON array
`vault://stats`	Counts by type and top locations

Project Structure

mcp/
├── .env.example
├── .gitignore
├── pyproject.toml
├── README.md
├── sandbox/                       # Agent outputs — gitignored
│   ├── prospects.json             # Live vault data
│   └── prospects_YYYYMMDD_*.csv  # Exports
└── src/
    └── leadscout/
        ├── __main__.py
        ├── app.py                 # Gradio UI
        ├── agent_manager.py       # Server lifecycle + agent runner
        ├── config.py              # Server definitions + agent prompt
        ├── tracing.py             # MCP activity capture
        └── servers/
            └── prospect_vault_server.py  # Custom FastMCP server

Setup

Prerequisites

Requirement	Notes
Python 3.13+
UV	Package manager — `curl -LsSf https://astral.sh/uv/install.sh \| sh`
Node.js 18+	Required for Playwright MCP — nodejs.org
OpenAI API key	`gpt-4o` is the default model

Install

cd mcp
uv sync
cp .env.example .env
# Edit .env — set OPENAI_API_KEY=sk-...

Playwright's browser binaries are downloaded automatically on first run by @playwright/mcp.

Run

uv run leadscout
# or
uv run python -m leadscout

Open http://localhost:7860

Usage

Basic search

Enter a Target URL — e.g. https://jobs.lever.co/anthropic
Describe what to look for — e.g. Remote Python backend roles paying $120k+
Select a Prospect type (job / company / contact / lead, or Any)
Click ▶ Start Search

The agent will navigate the site, extract listings, and save each one to the vault. Watch the MCP Activity tab to see every browser action and vault save in real time.

Follow-up questions

After a search completes, use the Follow-up chat to refine:

"Filter to remote-only roles" "Which of these companies are Series A or earlier?" "Export everything you found to CSV"

Export

Click Export CSV or Export JSON — the agent writes the file to sandbox/ and a download link appears in the UI.

Example prompts

Prompt	What happens
`Browse YC's work-at-a-startup page and save all engineering roles`	Playwright navigates workatastartup.com, saves each role
`Find ML engineer openings at AI labs and note the salary ranges`	Browses company career pages, extracts compensation data
`Find B2B SaaS companies that raised in the last 6 months`	Fetches news/funding pages, saves company leads
`List all jobs you found and tell me which looks best for a senior engineer`	Uses ProspectVault list_prospects, agent summarises

Environment Variables

Variable	Default	Description
`OPENAI_API_KEY`	(required)	Your OpenAI key
`OPENAI_MODEL`	`gpt-4o`	Model for the agent
`SANDBOX_DIR`	`./sandbox`	Where vault data and exports are written

Dependencies

Package	Purpose
`openai-agents`	Agent orchestration + MCP client
`mcp[cli]`	FastMCP server framework
`gradio`	Web UI
`python-dotenv`	Env var loading

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
.playwright-mcp		.playwright-mcp
sandbox		sandbox
src/leadscout		src/leadscout
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Folders and files

Latest commit

History

Repository files navigation

LeadScout

What it does

MCP Concepts

What is MCP?

The Three-Layer Architecture

MCP Primitives

Tools

Resources

Prompts

Transports: stdio vs SSE

stdio (Standard I/O)

SSE (Server-Sent Events)

Comparison

MCP Server Types

1. Prebuilt servers (community ecosystem)

2. Self-hosted custom servers

How the Agent Loop Uses MCP

Architecture

MCP Servers

ProspectVault tools & resources

Project Structure

Setup

Prerequisites

Install

Run

Usage

Basic search

Follow-up questions

Export

Example prompts

Environment Variables

Dependencies

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages