An end-to-end local business discovery, deep-enrichment, qualification, and multi-touchpoint AI outreach pipeline.
- Why We Need This in the Modern Age
- Project Overview
- System Architecture
- Key Features
- Complete System Setup
- Running the System
- Automated Testing
- License
In today's highly competitive digital landscape, B2B lead generation is often a manual, tedious, and error-prone process. Sales teams spend countless hours scraping directories, validating email addresses, researching company backgrounds, and writing generic outreach emails that result in low conversion rates.
The AI Lead Generation System shifts the paradigm. By leveraging Large Language Models (LLMs) and distributed background processing, this system acts as a persistent, tireless 24/7 sales development representative.
It autonomously:
- Discovers niche local businesses completely organically.
- Scrapes & Qualifies their websites to ensure they match your Ideal Customer Profile (ICP).
- Personalizes outreach copy specifically tailored to the prospect's real-world business data.
- Executes the campaign securely with built-in tracking.
The project is built on a robust, asynchronous tech stack designed to handle high-throughput network operations:
- Backend API:
FastAPI+Uvicornfor high-performance RESTful operations. - Task Scheduling:
APSchedulerfor lightweight, in-process asynchronous task execution (Optimized for free-tiers). - Database:
PostgreSQLviaSQLAlchemy (Async)&asyncpgfor non-blocking I/O. - Scraping Engine:
Playwright&BeautifulSoup4for deep web crawling. - AI Brain:
Groq API(Llama 3) for lightning-fast business qualification and personalized email generation. - Monitoring & Alerting:
Logurufor structured logging andTelegram Bot APIfor real-time pipeline notifications. - Document Generation: Auto-populates business-specific
PDF ProposalsandExceldaily performance reports.
The workflow follows a directed acyclic pipeline running daily via APScheduler.
graph TD;
subgraph "Schedule & Management"
Cron[APScheduler] -->|Triggers Daily| Pipeline[Daily Pipeline Task]
Pipeline -.->|Real-time System Alerts| Telegram[Telegram Bot]
end
subgraph "The AI Funnel"
Pipeline -->|1. Find Targets| GooglePlaces[Google Places API]
GooglePlaces -->|Return places| Scraper[Playwright Web Scraper]
Scraper -->|Extract Text| AI_Qualify{Groq AI Qualification}
AI_Qualify -- Rejected --> Drop[Discard Lead]
AI_Qualify -- Approved --> DB[(PostgreSQL Database)]
DB -->|Fetch Approved Leads| AI_Personalize[Groq AI Content Generator]
AI_Personalize -->|Generate Context| PDFGen[PDF Proposal Generator]
PDFGen --> EmailBuilder[Build HTML Email]
EmailBuilder -->|Attach PDF| Mailer[Email Delivery System]
end
subgraph "Delivery & Tracking"
Mailer -->|Send| ClientOutbox[SMTP / Brevo]
ClientOutbox --> Webhook[Tracking Webhooks]
Webhook -->|Open/Click Events| DB
Cron -->|Polling| Poller[IMAP Reply Poller]
Poller -->|Fetch Direct Replies| DB
end
subgraph "Reporting"
Pipeline -->|End of Day| Reporter[Daily Excel & Email Report]
Reporter --> Admin[Admin Inbox]
end
| Feature | Description |
|---|---|
| ๐ Multi-Radius Discovery | Uses Google Places API to search for specific niches within calculated geographical radii. |
| ๐ง Deep AI Qualification | Extracts structured website signals (old copyright years, mobile responsiveness) and leverages Groq LLM to answer: Does this business need our services? |
| ๐ Social & Competitor Intel | Autonomously detects social media footprints and performs local competitor benchmarking to find unique sales angles. |
| โ๏ธ Hyper-Personalization | Generates completely unique email bodies referencing the prospect's specific services, location, and competitor gaps. |
| ๐ Dynamic PDF Proposals | Auto-generates customized, visually polished business proposals (via ReportLab) and attaches them to outreach emails. |
| ๐ Autonomous Follow-Ups | Intelligent, tri-stage multi-touchpoint sequence that automatically pauses upon prospect engagement (clicks or replies). |
| ๐ค Smart AI Inbox | Reads incoming replies, categorizes intent (e.g., interested, pricing_inquiry), and automatically drafts context-aware responses. |
| ๐ Self-Improving Prompts | Analyzes weekly campaign metrics (open/reply rates) and dynamically leverages LLM feedback loops to rewrite underperforming outreach prompts. |
| ๐ WhatsApp & Telegram Alerts | Real-time notifications for discovered leads, pipeline errors, and high-intent prospect engagements (e.g., hot replies). |
| ๐ ROI & Excel Reporting | Compiles daily and weekly outreach metrics and sends an Excel overview directly to the administrator. |
| ๐ก API Key Security | All management endpoints are protected by X-API-Key headers. |
| ๐ธ Serverless Optimized | In-process APScheduler execution optimized for restrictive free-tiers (e.g., Render) by minimizing external connections. |
| โธ Dynamic Pipeline Control | API-driven jobs_config.json system allows granular HOLD/RUN and schedule adjustments for individual pipeline stages without a server reboot, paired with a global .env kill-switch. |
| ๐ข Enterprise-Grade Code | Fully documented, strictly typed codebase featuring professional Python docstrings, dependency injection (lru_cache), and comprehensive testing. |
- Python 3.11+
- PostgreSQL 15+ (Local or Cloud e.g., Supabase)
- API Keys for Google Places, Groq, and an SMTP Provider (Brevo/Sendinblue).
Copy the .env.example to .env and fill in the specifics:
cp .env.example .envEnsure you have set DATABASE_URL, GROQ_API_KEY, SMTP credentials, Telegram TELEGRAM_BOT_TOKEN, IMAP_SERVER credentials, and CRON_JOB_API_KEY (if utilizing the automated free-tier external trigger).
You may also set PRODUCTION_STATUS=HOLD to safely pause all automated daily operations globally. Granular control is managed dynamically via the API and config/jobs_config.json.
-
Clone & Environment:
git clone https://github.com/your-username/ai-lead-generation.git cd "AI LEAD GENERATION" python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Run Database Migrations:
alembic upgrade head
If you prefer an isolated containerized environment, ensure Docker is installed and run:
docker-compose up -d --buildThis will spin up the FastAPI app and (optionally) the PostgreSQL container. All background tasks run seamlessly inside the API process.
If deploying to a platform that puts the server to sleep (e.g., Render Free Tier), this repository includes an automated GitHub Action (.github/workflows/setup_cronjob.yml) that links your deployment to cron-job.org.
Simply provide your CRON_JOB_API_KEY and APP_URL as GitHub Secrets/Variables, and a system keep-alive will be automatically configured upon deployment.
Start the FastAPI Server, which automatically instantiates the APScheduler for background operations:
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000API Documentation is available at: http://localhost:8000/docs
For debugging or immediate execution, you can bypass the APScheduler and run the pipeline stages sequentially via the standalone script:
python test.pyThis ensures the database tables exist and immediately triggers the configured stages (e.g., Discovery, Qualification).
We enforce a strict 100% test coverage expectation across the system.
Our End-to-End test suite automatically spins up an asynchronous SQLite memory database (test.db) to safely perform operations. Models dynamically adapt schema bindings to ensure cross-compatibility between production PostgreSQL and local SQLite testing environments.
To run the entire suite:
pytest -v tests/Testing coverage includes:
- โ Database Schema & ORM capabilities (w/ schema-agnostic adaptations).
- โ Discovery Module (Mocks Google API & DOM Scrapers).
- โ Asynchronous Task Pipeline validation.
- โ System integration & Error handling cases.
With the successful deployment of v2.0, the platform now possesses autonomous follow-ups, reply classification, deep enrichment, and self-improving prompt analytics. Our upcoming v3.0 goals focus on expanding beyond email:
- Multi-Omnichannel Sequences: Coordinated outreach traversing SMS, direct LinkedIn messaging, and automated voicemail drops.
- Continuous CRM Sync: Bi-directional sync integrations with Salesforce and HubSpot to pass qualified leads instantaneously to closing teams.
- Voice AI Handoff: Integration with voice agents to trigger immediate outbound calls the moment a prospect hits a tracking landing page.
(Note: Technical implementation models and architectural choices for these upcoming changes are managed internally).
This project is PROPRIETARY AND CONFIDENTIAL.
It is strictly licensed only to the original author. Any other person, entity, or corporation wishing to implement, deploy, or use this system must obtain explicit, prior written permission from the author.
Unauthorized copying, distribution, modification, or commercial use of this codebase is strictly prohibited. See the LICENSE file for the full End-User License Agreement (EULA) rules and regulations.
Built with โค๏ธ to revolutionize B2B Sales