Skip to content

bsat007/apollo-leads-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Apollo.io Contact Scraper

A Python tool to scrape contact data from Apollo.io search results. Handles authentication, pagination, and exports to CSV/JSON.

Features

  • Session Persistence - Log in once, session saved for future runs
  • Automatic Pagination - Scrapes all pages with random delays
  • Incremental Saving - Data saved after each page (no data loss on interruption)
  • Graceful Stop - Press Ctrl+C anytime to stop and keep collected data
  • Multiple Formats - Exports to both CSV and JSON

Installation

# Install dependencies
pip install playwright

# Install browser
playwright install chromium

Usage

1. Run the scraper

python main.py

2. First run - Login

  • Browser opens automatically
  • Log in to Apollo manually
  • Press Enter in terminal when done
  • Session saved for future runs

3. Provide search URL

Option A: Save URL to search_url.txt and press Enter

Option B: Paste URL directly when prompted

4. Scraping

  • Scraper navigates through all pages automatically
  • Random 2-5 second delays between pages
  • Press Ctrl+C anytime to stop and keep data

Output

Files saved to output/ folder with timestamp:

output/
├── contacts_20251215_002202.csv
└── contacts_20251215_002202.json

Fields Extracted

Field Description
name Contact's full name
job_title Job title/position
company Company name
email Email address (if visible)
linkedin_url LinkedIn profile URL
location City, State/Country
profile_url Apollo profile URL

Project Structure

apollo-scraper/
├── main.py           # Entry point
├── scraper.py        # Scraping logic
├── auth.py           # Session management
├── utils.py          # Helper functions
├── search_url.txt    # Your search URL (optional)
├── .session/         # Saved login session
├── output/           # Scraped data
└── requirements.txt  # Dependencies

Tips

  • Long URLs: Save to search_url.txt instead of pasting (avoids terminal issues)
  • Rate Limiting: Random delays are built-in, but don't run too aggressively
  • Session Expired: Delete .session/apollo_state.json to force re-login

Requirements

  • Python 3.10+
  • Playwright
  • Apollo.io paid account

License

MIT

About

A Python tool to scrape contact data from Apollo.io search results. Handles authentication, pagination, and exports to CSV/JSON.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages