A Python tool to scrape contact data from Apollo.io search results. Handles authentication, pagination, and exports to CSV/JSON.
- Session Persistence - Log in once, session saved for future runs
- Automatic Pagination - Scrapes all pages with random delays
- Incremental Saving - Data saved after each page (no data loss on interruption)
- Graceful Stop - Press Ctrl+C anytime to stop and keep collected data
- Multiple Formats - Exports to both CSV and JSON
# Install dependencies
pip install playwright
# Install browser
playwright install chromiumpython main.py- Browser opens automatically
- Log in to Apollo manually
- Press Enter in terminal when done
- Session saved for future runs
Option A: Save URL to search_url.txt and press Enter
Option B: Paste URL directly when prompted
- Scraper navigates through all pages automatically
- Random 2-5 second delays between pages
- Press Ctrl+C anytime to stop and keep data
Files saved to output/ folder with timestamp:
output/
├── contacts_20251215_002202.csv
└── contacts_20251215_002202.json
| Field | Description |
|---|---|
| name | Contact's full name |
| job_title | Job title/position |
| company | Company name |
| Email address (if visible) | |
| linkedin_url | LinkedIn profile URL |
| location | City, State/Country |
| profile_url | Apollo profile URL |
apollo-scraper/
├── main.py # Entry point
├── scraper.py # Scraping logic
├── auth.py # Session management
├── utils.py # Helper functions
├── search_url.txt # Your search URL (optional)
├── .session/ # Saved login session
├── output/ # Scraped data
└── requirements.txt # Dependencies
- Long URLs: Save to
search_url.txtinstead of pasting (avoids terminal issues) - Rate Limiting: Random delays are built-in, but don't run too aggressively
- Session Expired: Delete
.session/apollo_state.jsonto force re-login
- Python 3.10+
- Playwright
- Apollo.io paid account
MIT