Skip to content

harshalsp0011/Lead-Intelligence-Platform

Repository files navigation

Utility Lead Intelligence Platform

AI-assisted B2B prospecting and outreach automation for utility cost-reduction sales teams. The platform runs a full pipeline from lead discovery to personalized email delivery — with two human approval checkpoints and no email ever sent without explicit review.


Table of Contents

  1. What It Does
  2. Architecture
  3. Agent System
  4. Tech Stack
  5. Project Status
  6. Quick Start
  7. Configuration Reference
  8. Dashboard Pages
  9. Observability & Monitoring
  10. Database Tables
  11. API Reference
  12. Troubleshooting
  13. Documentation Index

1. What It Does

The platform automates the full B2B sales prospecting cycle for utility cost-reduction consultants:

Find companies  →  Enrich contacts  →  Score as leads
→  Draft personalized emails  →  Human review  →  Send
→  Schedule follow-ups  →  Track replies

Two human checkpoints — nothing moves forward without approval:

  1. After scoring: approve or skip each lead before any email is written
  2. After drafting: approve, edit, or reject each email before it is sent

What is fully automated (no human needed):

  • Daily news scan for companies with high utility spend signals
  • Contact lookup and enrichment (8-step waterfall)
  • Scoring 0–100 with LLM narrative explanation
  • Email drafting with AI Critic review loop (up to 2 rewrites)
  • Sending via SendGrid with open/click tracking
  • Follow-up scheduling (Day 3 / 7 / 14)

2. Architecture

Full System Flow

┌─────────────────────────────────────────────────────────┐
│                    Browser (localhost:3000)             │
│                  React + Vite + Tailwind                │
│                                                         │
│  Leads  │  Email Review  │  Pipeline  │  Triggers  │ ...│
└─────────────────────────┬───────────────────────────────┘
                          │  HTTP (fetch)
                          ▼
┌──────────────────────────────────────────────────────────┐
│                   API Layer (localhost:8001)             │
│                    FastAPI + Uvicorn                     │
│                                                          │
│  GET /leads   GET /emails   POST /trigger   GET /pipeline│
└──────┬───────────────────────────────────────────────────┘
       │
       ▼
┌──────────────────────────────────────────────────────────┐
│                       Agents                             │
│                                                          │
│  Scout Agent         Analyst Agent    Writer Agent       |
│  (news → companies)  (score + enrich) (draft + critic)   │
│                                                          │
│  Outreach Agent      Tracker Agent    Orchestrator       │
│  (send + followup)   (reply monitor)  (chain runner)     │
└──────┬──────────────────────────────────────┬────────────┘
       │                                      │
       ▼                                      ▼
┌──────────────────────┐         ┌─────────────────────────┐
│   External APIs      │         │       PostgreSQL        │
│                      │         │                         │
│  Tavily (search)     │         │  companies              │
│  Apollo (contacts)   │         │  contacts               │
│  Hunter.io (email)   │         │  lead_scores            │
│  Google Maps Places  │         │  email_drafts           │
│  Yelp Business       │         │  outreach_events        │
│  SendGrid (email)    │         │  followup_schedules     │
│  Ollama / OpenAI     │         │  agent_runs             │
└──────────────────────┘         │  email_win_rate         │
                                 └─────────────────────────┘

Docker Services

docker-compose up
  ├── api       (port 8001)  FastAPI backend — all agents run inside this process
  └── frontend  (port 3000)  nginx serving the Vite-built React app

No separate agent containers. Airflow is a scheduled add-on (not required to run the platform). Database is external PostgreSQL — not in Docker.

Human-in-Loop Checkpoints

Scout finds companies from the news
        │
        ▼
Analyst enriches contacts + scores 0–100
        │
        ▼
  [HUMAN REVIEW #1]  ← Leads page: approve or skip each company
        │
        ▼
Writer drafts personalized email → Critic reviews → rewrites if needed
        │
        ▼
  [HUMAN REVIEW #2]  ← Email Review page: approve/edit/reject
        │
        ▼
SendGrid sends an email with open + click tracking
        │
        ▼
Follow-ups scheduled at Day 3 / 7 / 14 (cancelled if reply received)

3. Agent System

Agentic Design Principle

Old (rule-based):  fixed query → fixed formula → fixed output

Agentic (current): LLM reasons about available data
                       → decides what tools to call and in what order
                       → executes tools (APIs, DB, scoring math)
                       → evaluates result quality
                       → loops if not good enough
                       → returns result

LLM = decision and reasoning layer only. APIs / DB / math = deterministic tools it calls. LLM never does math directly. LLM classifies, infers, decides, evaluates, generates text.


Scout Agent — Company Discovery

Reads business news daily via Tavily and extracts companies with intent signals — new facilities opening, expansions, multi-location operations — that indicate high utility spend.

Tavily news search (industry + location queries)
  ↓
LLM extracts: company name, industry, city, why it's a signal
  ↓
Deduplication against existing companies in DB
  ↓
Save new companies with source = 'news_scout'

Sources used:

1. Tavily news mode   — AI-powered news search for intent signals
2. Google Maps        — Places API for local business discovery
3. Yelp               — Business Search API
4. Directory Scraper  — configured sources (Yellow Pages, local directories)

Analyst Agent — Enrichment + Scoring

For each new company, the Analyst finds the right contact and computes a fit score.

Load company from DB
  ↓
Contact enrichment waterfall (8 steps):
  Hunter.io → Apollo → website scraper → Serper →
  Snov.io → Prospeo → ZeroBounce → permutation fallback
  ↓
Phone enrichment: Google Places → Yelp → website scraper
  ↓
LLM Data Inspector:
  Infers missing industry from company name
  Detects data gaps → triggers re-enrichment if needed
  ↓
score_engine.compute_score(...)   ← deterministic math
  Score = (Recovery × 0.40) + (Industry × 0.25) + (Multisite × 0.20) + (Data × 0.15)
  ↓
LLM Score Narrator:
  Generates a plain-English explanation of why this company scores this way

Tier: ≥70 = high, 40–69 = medium, <40 = low


Writer Agent — Draft Generation + Critic Loop

Writer:
  Reads email_win_rate for the best-performing angle in this industry
  Reads company data + score narrative
  Generates full email: subject line + body (personalized, with savings figure, clear CTA)
  ↓
Critic (second LLM call):
  Scores draft 0–10 on: personalization, savings figure, CTA clarity, human tone, subject line
  Returns score + specific improvement instructions
  ↓
If score < 7: Writer rewrites using Critic feedback (max 2 rewrites)
If score ≥ 7: save draft → Email Review queue
If still < 7 after 2 rewrites: save with low_confidence=true → flagged in UI

The email angle used (e.g. "audit_offer") is saved as template_used. When a reply is received, Tracker updates email_win_rate → future Writer runs bias toward winning angles per industry.


Run Tracking

Every pipeline trigger creates one agent_runs row:

agent_runs
  id, trigger_source ("dashboard" / "airflow"), status, current_stage
  companies_found, companies_scored, drafts_created, emails_sent
  started_at, completed_at, error_message

Every tool call appends one agent_run_logs row — full audit trail.


4. Tech Stack

Layer Technology Purpose Status
Frontend React 18 + Vite + Tailwind CSS Dashboard UI ✅ Live
Routing React Router v6 Page navigation ✅ Live
API FastAPI + Uvicorn REST backend, agent orchestration ✅ Live
Agent framework LangChain LLM tool-calling and ReAct loops ✅ Live
LLM (local) Ollama + llama3.2 Default — runs on your machine, zero cost ✅ Live
LLM (cloud) OpenAI gpt-4o-mini Optional — set LLM_PROVIDER=openai ✅ Live
Embeddings Ollama + nomic-embed-text 768-dim vectors for semantic search ✅ Running
Vector store PostgreSQL + pgvector Semantic knowledge base retrieval 🔲 Planned
ORM SQLAlchemy Database models and queries ✅ Live
Database PostgreSQL (AWS RDS) Business data + agent memory ✅ Live
Search Tavily API Company discovery + news signals ✅ Live
Maps Google Maps Places API Company discovery + phone lookup ✅ Live
Business search Yelp Business API Company discovery + phone fallback ✅ Live
Enrichment Apollo, Hunter.io, Prospeo, Snov.io Contact email/title lookup waterfall ✅ Live
Verification ZeroBounce Email address validation ✅ Live
Email delivery SendGrid Pipeline outreach + tracking ✅ Live
CRM integration HubSpot CRM lead sync + CRM email send 🔲 Planned
Observability LangSmith Full LLM trace per run ✅ Live
Containerization Docker + nginx 2 containers: api + frontend ✅ Live
Scheduled runs Airflow (add-on) Optional — daily pipeline scheduling ⚠️ Code exists, not live

Planned: Vector Memory (Knowledge Base)

The next major agentic upgrade gives the Writer agent long-term memory about Troy & Banks services, case studies, and proof points — retrieved semantically per company before writing.

Company profile: "manufacturing, 12 sites, Ohio, overpaying on gas"
        ↓
Retrieval Agent (nomic-embed-text → pgvector cosine search)
        ↓
Top 3 relevant items:
  • Case study: Ohio manufacturer, 8 sites, cut gas by $80k → [link]
  • Service: multi-site utility contract renegotiation
  • CTA: Free 30-min audit → calendly.com/kevingibs/30min
        ↓
Writer weaves them naturally into the email

Why pgvector, not a separate vector DB: nomic-embed-text already runs in Ollama. pgvector runs inside existing PostgreSQL. No new container, no new service, no separate backup process.

Why not a knowledge graph (Neo4j etc.): Vector similarity handles the main use case (find relevant case studies for this company type). Graph reasoning is only needed for multi-hop queries like "which service performs best in deregulated manufacturing states" — add that later if needed.

See docs/AGENTIC_TRANSFORMATION_PLAN.md Phases KB-0 through KB-4 for full build plan.


5. Project Status

What Works End-to-End Right Now

You can run this full sequence today with no missing pieces:

1. Triggers page → Run Scout
   → Companies discovered from business news and saved to DB

2. Triggers page → Run Analyst
   → Contacts found, companies scored 0–100 with narrative explanation

3. Leads page → review scores → Approve high-tier leads

4. Triggers page → Run Writer
   → Personalized emails drafted, AI Critic review loop runs

5. Email Review page → read each draft → Approve & Send
   → Email sent via SendGrid, follow-ups scheduled in DB

Pipeline stages update automatically at every step.

Feature Status

Feature Status
Scout: news-based company discovery ✅ Done
Contact enrichment (8-step waterfall) ✅ Done
LLM scoring + narrative explanation ✅ Done
Lead approval (Leads page) ✅ Done
Writer + Critic + rewrite loop (pipeline) ✅ Done
CRM Leads tab — context notes + writer path ✅ Done
CRM email: human-feedback regenerate dialog ✅ Done
CRM email: send confirm dialog (TO/FROM/Subject) ✅ Done
Real sender signature (Kevin Gibs, Troy & Banks) ✅ Done
Email approval queue (Email Review — Pipeline tab) ✅ Done
SendGrid sending with open + click tracking ✅ Done
Follow-up scheduling (Day 3/7/14) ✅ Done (DB only)
Dashboard: Leads, Pipeline, Triggers, Email Review ✅ Done
Follow-up actual sending via Airflow ⚠️ Code exists, Airflow not live
Reply detection (webhook receiver built, not wired) ⚠️ Partial
Knowledge Base + Vector memory (RAG) 🔲 Planned — Phase KB
pgvector semantic case study retrieval 🔲 Planned — Phase KB-2
HubSpot CRM sync + HubSpot send path 🔲 Planned — Phase CRM-5
Manual lead add form 🔲 Not built
Reply inbox page 🔲 Not built

See docs/BUILD_STATUS.md for full details, priority order, and what to build next.

Build Priority (What's Next)

1. Knowledge Base + pgvector (Phase KB)  ← writer gets case studies + proof points + links
2. Reply detection webhook               ← closes biggest gap in pipeline
3. HubSpot CRM sync + send (Phase CRM-5)← emails via HubSpot, no spam issues
4. Reply inbox page                      ← makes replies visible in dashboard
5. Airflow live scheduling               ← makes follow-ups actually send

6. Quick Start

Prerequisites

  • Docker Desktop running
  • Ollama installed and running locally with llama3.2 pulled (or OpenAI API key)
  • PostgreSQL instance with migrations applied
# Check prerequisites
docker --version
ollama list       # should show llama3.2

Setup

# 1. Clone
git clone <your-repo-url>
cd utility-lead-platform

# 2. Copy env file
cp .env.example .env
# Fill in: DATABASE_URL, TAVILY_API_KEY, HUNTER_API_KEY,
#          SENDGRID_API_KEY, SENDGRID_FROM_EMAIL, ALERT_EMAIL

# 3. Pull Ollama model (runs on your host machine)
ollama pull llama3.2

# 4. Run database migrations
psql $DATABASE_URL -f database/migrations/001_create_companies.sql
# ... run all migrations in order 001 through latest

# 5. Build and start containers
docker-compose up --build

Access

Service URL
Dashboard http://localhost:3000
API http://localhost:8001
API docs (Swagger) http://localhost:8001/docs

Useful Container Commands

# View API logs (see agent steps, errors)
docker-compose logs api -f

# Check which containers are running
docker-compose ps

# Rebuild API after code changes
docker-compose build api && docker-compose up -d api

# Rebuild frontend after UI changes
docker-compose build frontend && docker-compose up -d frontend

# Stop everything
docker-compose down

# Clean restart (removes orphan containers)
docker-compose down --remove-orphans && docker-compose up -d

Important: Always rebuild the relevant container after code changes. Running containers do not pick up file changes automatically.


7. Configuration Reference

All config is read from .env. Copy .env.example to .env and fill in values.

Required

Variable Description Example
DATABASE_URL PostgreSQL connection string postgresql://user:pass@host:5432/dbname
LLM_PROVIDER ollama (local) or openai (cloud) ollama
LLM_MODEL Model name for selected provider llama3.2
OLLAMA_BASE_URL Ollama server URL (Docker uses host.docker.internal) http://host.docker.internal:11434
TAVILY_API_KEY Tavily search API key tvly-...
SENDGRID_API_KEY SendGrid email delivery key SG.xxx
SENDGRID_FROM_EMAIL Verified sender email address outreach@yourdomain.com
ALERT_EMAIL Email address for system notifications sales@yourdomain.com
SENDER_NAME Name shown in outbound email From field Jane Smith
SENDER_TITLE Title shown in email signature Energy Consultant

Optional

Variable Description Default
OPENAI_API_KEY Required only if LLM_PROVIDER=openai blank
GOOGLE_MAPS_API_KEY Google Places — company discovery + phone lookup blank
YELP_API_KEY Yelp Business — company discovery + phone fallback blank
HUNTER_API_KEY Hunter.io — domain email search blank
APOLLO_API_KEY Apollo.io — contact enrichment blank
SERPER_API_KEY Serper.dev — Google search for email discovery blank
PROSPEO_API_KEY Prospeo.io — LinkedIn contact enrichment blank
ZEROBOUNCE_API_KEY ZeroBounce — email verification blank
SNOV_CLIENT_ID Snov.io client ID blank
SNOV_CLIENT_SECRET Snov.io client secret blank
SCRAPERAPI_KEY ScraperAPI — directory scraping proxy blank
HIGH_SCORE_THRESHOLD Minimum score for "high" tier 70
MEDIUM_SCORE_THRESHOLD Minimum score for "medium" tier 40
EMAIL_DAILY_LIMIT Max emails sent per day 50
DEPLOY_ENV local disables API key auth on routes local

8. Dashboard Pages

Page Route What it shows
Leads /leads All companies with search + filters — approve/reject leads
Email Review /emails Pending email drafts — approve / edit+approve / reject
Triggers /triggers Manual pipeline controls: Scout, Analyst, Enrich, Writer
Pipeline /pipeline Agent health, stage counts, recent activity feed

9. Observability & Monitoring

Option 1 — Docker Logs (always available)

docker-compose logs api -f

Shows HTTP requests, agent steps, and errors in real time.

Option 2 — LangSmith (recommended — visual trace per run)

LangSmith shows every LLM call, which tool was chosen, arguments, token counts, and latency in a visual timeline.

Setup:

  1. Sign up free at https://smith.langchain.com
  2. Go to Settings → API Keys → Create API Key
  3. Add to .env:
LANGCHAIN_TRACING_V2=true
LANGCHAIN_API_KEY=ls__your_key_here
LANGCHAIN_PROJECT=utility-lead-platform
  1. Rebuild and restart the API container.

Option 3 — Database Query (audit trail)

Every agent step is persisted in PostgreSQL:

-- Last 10 agent runs with status
SELECT id, trigger_source, status, current_stage,
       companies_found, error_message, created_at
FROM agent_runs
ORDER BY created_at DESC LIMIT 10;

-- Every step in a specific run
SELECT agent, action, status, output_summary, duration_ms
FROM agent_run_logs
WHERE run_id = '<run-id>'
ORDER BY logged_at ASC;

Or via API: GET http://localhost:8001/pipeline/run/{run_id}


10. Database Tables

All tables live in external PostgreSQL. Migrations are in database/migrations/.

Core Data Tables

Table Purpose Key columns
companies Every company discovered id, name, industry, city, website, source, status
company_features Enrichment signals company_id, employee_count, location_count, utility_spend_estimate
lead_scores Analyst scoring output company_id, score, tier, approved_human, score_reason
contacts Decision-maker contacts company_id, name, email, title, phone
email_drafts Writer-generated drafts company_id, subject_line, body, critic_score, approved, sent_at
outreach_events Every email sent/opened/replied company_id, event_type, event_at, reply_sentiment
followup_schedules Scheduled follow-up emails company_id, send_date, sequence_number, status

Run Tracking Tables

Table Purpose
agent_runs One row per pipeline trigger — status, stage counts, timing
agent_run_logs Step-by-step audit log per run
human_approval_requests Human-in-loop queue for leads and email approvals

Learning Tables

Table Purpose Who writes Who reads
source_performance Quality score per Scout source per industry Scout (after each run) Scout (next run)
email_win_rate Open/reply rate per email angle per industry Tracker (after replies) Writer (before drafting)

Migration Files

database/migrations/
  001_create_companies.sql
  002_create_contacts.sql
  003_create_company_features.sql
  004_create_lead_scores.sql
  005_create_email_drafts.sql
  006_create_outreach_events.sql
  007_create_directory_sources.sql
  008_create_agent_runs.sql
  009_create_agent_run_logs.sql
  010_create_source_performance.sql
  011_create_email_win_rate.sql
  012_create_human_approval_requests.sql
  013_alter_companies_add_run_id.sql
  014_create_followup_schedules.sql
  015_alter_email_drafts_add_critic_fields.sql
  016_alter_companies_add_phone.sql

11. API Reference

Full Swagger docs at http://localhost:8001/docs. Key endpoints:

Method Endpoint Description
GET /leads Fetch leads with filters (search, status, tier, industry)
PATCH /leads/{id}/approve Approve a single lead
PATCH /leads/{id}/reject Reject a single lead
GET /emails/pending Fetch unapproved email drafts
PATCH /emails/{id}/approve Approve an email draft (triggers send)
PATCH /emails/{id}/edit Edit subject/body of a draft
PATCH /emails/{id}/reject Reject a draft (resets company to approved)
POST /trigger/scout Run Scout (background)
POST /trigger/analyst Run Analyst for all unscored companies
POST /trigger/writer Run Writer for all approved companies
GET /trigger/{id}/status Poll status of a triggered run
GET /pipeline/status Current stage counts
GET /pipeline/run/{run_id} Status and logs for one agent run
GET /health API health check

12. Troubleshooting

API not responding

docker-compose logs api --tail 30
docker-compose restart api

Ollama not responding (LLM calls fail)

ollama serve          # start Ollama if not running
ollama list           # confirm llama3.2 is pulled
curl http://localhost:11434/api/tags

Inside Docker, Ollama is reached via host.docker.internal:11434. Confirm OLLAMA_BASE_URL=http://host.docker.internal:11434 in .env.

Scout finds 0 companies

  • Check TAVILY_API_KEY is set and quota is not exceeded
  • Check API logs: docker-compose logs api -f

Database connection error

  • Confirm DATABASE_URL is correct in .env
  • Test: docker exec <api-container> python -c "from database.connection import engine; print(engine.url)"

Frontend blank page / 404 on refresh

Rebuild the frontend container:

docker-compose build frontend && docker-compose up -d frontend

SendGrid emails not delivering

  • Verify sender email is authenticated in SendGrid dashboard
  • Confirm SENDGRID_FROM_EMAIL matches the verified sender identity
  • Check SendGrid activity feed for blocked or rejected messages

Code changes not showing

Always rebuild the relevant container after editing code:

docker-compose build api && docker-compose up -d api      # after backend changes
docker-compose build frontend && docker-compose up -d frontend  # after UI changes

13. Documentation Index

Document What it covers
docs/BUILD_STATUS.md Exact status of every feature — what's done, what's wired live, what's missing, build priority order
docs/HOW_IT_WORKS.md Plain-English guide for business stakeholders — the journey of a lead from discovery to reply
docs/SYSTEM_ARCHITECTURE.md Full technical architecture — every agent, every API, every data flow, database schema
docs/AGENTIC_TRANSFORMATION_PLAN.md Agentic upgrade plan — Phases A/B/C complete, Phase D + KB (vector memory) planned
docs/AGENTIC_DESIGN.md Agentic reasoning patterns used across agents
docs/CONTACT_ENRICHMENT_STRATEGY.md Contact enrichment waterfall — sources, fallbacks, quality gates
docs/ENRICHMENT_API_GUIDE.md API-by-API guide for enrichment integrations
docs/SCOUT_SOURCES_AND_SIGNALS.md Scout news signals and source configuration

About

AI-powered Multi-Agent Lead Intelligence Platform for utility cost-reduction consulting. Uses specialized agents (Scout, Analyst, Writer) to discover high-spend companies, enrich and score them as leads, and automate personalized outreach with human approval at every step

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors