Skip to content

πŸ”— Scrape LinkedIn profile data at scale β€” names, titles, companies, locations, CSV export. Python + Selenium.

License

Notifications You must be signed in to change notification settings

SoCloseSociety/LinkedinDataScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

LinkedIn Data Scraper

Scrape LinkedIn search results and extract professional profile data β€” Excel & CSV export with 15+ fields.

License: MIT Python 3.9+ Platform Playwright Streamlit GitHub Stars Issues Forks

Quick Start β€’ Features β€’ Configuration β€’ FAQ β€’ Contributing


What is LinkedIn Data Scraper?

LinkedIn Data Scraper is a free, open-source LinkedIn profile extraction tool built with Python and Playwright. Most LinkedIn scrapers break constantly because they parse HTML that LinkedIn changes every week. This scraper intercepts LinkedIn's internal Voyager API to get structured JSON data directly β€” the same data LinkedIn's own frontend uses. When the API doesn't capture something, it falls back to DOM parsing.

The result: reliable data extraction that survives LinkedIn UI updates, with 15+ fields per profile including emails, phone numbers, experience, education, and skills.

Who is this for?

  • Recruiters building candidate pipelines from LinkedIn searches
  • Sales Teams extracting lead data for CRM import
  • Market Researchers analyzing talent pools by industry and location
  • HR Departments benchmarking compensation and title distribution
  • Growth Hackers building B2B prospect lists at scale
  • Developers learning Playwright stealth and API interception

Key Features

Feature Description
People Search Search by keywords, name, job title, company, location, country, industry
Profile Extraction Full name, headline, company, location, about, experience, education, skills, connections
Contact Info Email, phone, website β€” when publicly visible on the profile
Excel Export Color-coded headers, clickable links, auto-filters, frozen headers, email highlighting
CSV Export Clean UTF-8 CSV for CRM, mail merge, Google Sheets
Anti-Detection Playwright stealth, randomized delays, cookie sessions, adaptive rate limiting
Rich CLI Progress bars, colored output, interactive prompts
Web UI Streamlit browser interface with download buttons
API Interception Captures LinkedIn's Voyager API for stable structured data
Cross-Platform macOS, Windows, Linux β€” Chrome, Edge, or Chromium
Docker Containerized deployment ready

Extracted Data Fields

Field Description Source
Full Name Profile full name Search + API
Headline Job title / professional tagline Search + API
Company Current company Search + API
Location City, region, or country Search + API
Industry Professional sector Profile API
Email Email address (if publicly visible) Contact Info API
Phone Phone number (if publicly visible) Contact Info API
Website Personal or company website Contact Info API
LinkedIn URL Direct clickable link to profile Search
Current Title Current job title Profile API
Experience Work history (title, company, dates) Profile API + DOM
Education Schools, degrees, fields of study Profile API + DOM
Skills Professional skills list Skills API + DOM
Connections Number of LinkedIn connections Profile API
About Profile summary / bio Profile API + DOM

Prerequisites

Requirement Notes
Python 3.9+ Download Python
LinkedIn Account Needed for authentication
Google Chrome Recommended (or Chromium / Edge)

Installation

Quick Setup (Mac / Linux)

git clone https://github.com/SoCloseSociety/LinkedinDataScraper.git
cd LinkedinDataScraper
make install
source .venv/bin/activate

Manual Setup (Windows / Any OS)

git clone https://github.com/SoCloseSociety/LinkedinDataScraper.git
cd LinkedinDataScraper
python -m venv .venv

# Activate:
# Windows:   .venv\Scripts\activate
# Mac/Linux: source .venv/bin/activate

pip install -r requirements.txt
playwright install chromium

Optional: Save credentials

cp .env.example .env
# Edit .env with your LinkedIn email and password

Quick Start

# Interactive mode β€” prompts for everything
python -m linkedin_scraper

# Direct search with location
python -m linkedin_scraper "software engineer" --location "San Francisco" --max-results 20

# Industry filter + Excel only
python -m linkedin_scraper "data scientist" --industry "Technology" --format excel

# Fast mode β€” search results only, no profile pages
python -m linkedin_scraper "CEO" --location "New York" --no-details -n 50

CLI Usage

python -m linkedin_scraper [keywords] [options]
Option Short Description Default
keywords Search keywords (e.g., "software engineer") Interactive prompt
--location -l City, region, or country Any
--industry -i Industry filter Any
--max-results -n Max profiles to extract (cap: 80) 50
--output -o Output directory output/
--format -f csv, excel, or both both
--no-details Skip profile pages (faster) Extract details
--headless Run browser in headless mode Visible
--email LinkedIn email Env var or manual
--password LinkedIn password Env var or manual
--cookies Cookies file path linkedin_cookies.json
--verbose -v -v INFO, -vv DEBUG Warning
--version Show version

Real-World Examples

# Marketing directors in France
python -m linkedin_scraper "directeur marketing" -l "France" -n 30

# Finance project managers in London β†’ Excel
python -m linkedin_scraper "project manager" -l "London" -i "Finance" -f excel

# Recruiters in Berlin β€” fast scan, no detail pages
python -m linkedin_scraper "recruiter" -l "Berlin" --no-details -n 80

# Debug mode
python -m linkedin_scraper "developer" -l "Tokyo" -vv

Web Interface (Streamlit)

make ui
# or: streamlit run app.py

The web UI provides:

  • Search form with all filters (keywords, location, industry, max results)
  • Real-time progress tracking with status updates
  • Interactive results table with sorting and filtering
  • One-click download buttons for CSV and Excel
  • Metrics dashboard (total profiles, with email, with phone, data source)
  • Live logs for monitoring

Excel Output Format

The Excel file is professionally formatted and ready to use:

Feature Detail
Color-coded headers Purple (Identity), Indigo (Contact), Dark (Professional), Gray (Meta)
Email highlighting Green = email found, Red = no email
Clickable links LinkedIn profile URLs and websites open in browser
Auto-filters All columns sortable and filterable
Frozen header Header row + name column stay visible when scrolling
Alternating rows Soft tint for readability
Summary sheet Stats: total profiles, % with email, % with phone, data sources
Bold names Full Name column uses larger bold font

Authentication

LinkedIn requires login. Three methods are supported:

Method When to use
Cookie sessions (recommended) Login once, cookies saved for future runs
Auto login Provide email/password via CLI or .env
Manual login Browser opens, you log in manually β€” handles 2FA and CAPTCHA

First run tip: Don't use --headless so the browser is visible. Complete login manually if there's a security challenge. Cookies are saved automatically.


Rate Limiting & Safety

LinkedIn aggressively detects automation. Built-in protections:

Protection Detail
Randomized delays 3-7 seconds between profile visits
Long pauses 15-30 seconds every 8 profiles
Session limit Max 80 profiles per run
Adaptive backoff Exponential delay increase on errors
Cookie persistence Avoids repeated logins
Playwright stealth Anti-detection plugin active
Auto-stop Halts after 5 consecutive failures

Recommendation: Keep --max-results under 50 for regular use.


Docker Deployment

# Build
docker build -t linkedin-scraper .

# Run web UI
docker run -p 8501:8501 linkedin-scraper

# Run CLI
docker run -it -v $(pwd)/output:/app/output linkedin-scraper \
  python -m linkedin_scraper "keywords" --location "City" --headless

How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  1. AUTHENTICATE                                             β”‚
β”‚     Load cookies β†’ validate session β†’ login if needed        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  2. SEARCH                                                   β”‚
β”‚     Navigate to LinkedIn People Search                       β”‚
β”‚     Apply filters (location, industry)                       β”‚
β”‚     Intercept Voyager API search responses β†’ mini-profiles   β”‚
β”‚     Paginate through results                                 β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  3. EXTRACT (optional)                                       β”‚
β”‚     Visit each profile page                                  β”‚
β”‚     Intercept Voyager API profile + contact info responses   β”‚
β”‚     DOM fallback for missing fields                          β”‚
β”‚     Rate-limited with adaptive delays                        β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚  4. EXPORT                                                   β”‚
β”‚     Generate formatted Excel (.xlsx) with color coding       β”‚
β”‚     Generate clean CSV (.csv) with UTF-8 BOM                 β”‚
β”‚     Summary statistics sheet                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Project Structure

LinkedinDataScraper/
β”œβ”€β”€ linkedin_scraper/           # Main Python package
β”‚   β”œβ”€β”€ __main__.py             # CLI entry point
β”‚   β”œβ”€β”€ cli.py                  # Argument parsing
β”‚   β”œβ”€β”€ config.py               # Constants & rate limits
β”‚   β”œβ”€β”€ models.py               # Data models (LinkedInProfile)
β”‚   β”œβ”€β”€ auth/
β”‚   β”‚   └── session.py          # Cookie-based authentication
β”‚   β”œβ”€β”€ scraper/
β”‚   β”‚   β”œβ”€β”€ browser.py          # Playwright + stealth browser
β”‚   β”‚   β”œβ”€β”€ search.py           # LinkedIn people search
β”‚   β”‚   β”œβ”€β”€ profile.py          # Profile detail extraction
β”‚   β”‚   β”œβ”€β”€ api_interceptor.py  # Voyager API response capture
β”‚   β”‚   └── selectors.py        # CSS selectors (fallback)
β”‚   β”œβ”€β”€ export/
β”‚   β”‚   └── exporter.py         # CSV + Excel export
β”‚   └── utils/
β”‚       └── rate_limiter.py     # Adaptive rate limiting
β”œβ”€β”€ app.py                      # Streamlit web interface
β”œβ”€β”€ assets/                     # Brand assets (logo, banner)
β”œβ”€β”€ output/                     # Exported files
β”œβ”€β”€ requirements.txt            # Python dependencies
β”œβ”€β”€ pyproject.toml              # Package metadata
β”œβ”€β”€ Makefile                    # Quick commands
β”œβ”€β”€ Dockerfile                  # Container deployment
β”œβ”€β”€ .env.example                # Credentials template
β”œβ”€β”€ LICENSE                     # MIT License
└── README.md

Upgrading from v1

Feature v1 (old) v2 (new)
Browser Selenium Playwright + stealth
Extraction DOM parsing only Voyager API interception + DOM fallback
Output 2 separate CSVs Formatted Excel + CSV
Fields 2 (link, title) 15+ fields (email, phone, experience...)
Error handling None Retry + graceful degradation
Rate limiting sleep(10) Adaptive with backoff
Platform Chrome only Chrome, Edge, Chromium (Mac/Win/Linux)
Interface input() terminal Rich CLI + Streamlit Web UI

FAQ

Q: Is this free? A: Yes. LinkedIn Data Scraper is 100% free and open source under the MIT license.

Q: Do I need a LinkedIn API key? A: No. This tool uses Playwright browser automation with Voyager API interception, no official API key needed.

Q: How many profiles can I scrape? A: The built-in safety cap is 80 profiles per session to respect LinkedIn's rate limits. Keep --max-results under 50 for regular use.

Q: Are my credentials safe? A: Credentials are stored in a local .env file that is gitignored. Cookie sessions are saved locally for future runs.

Q: Does it work without a LinkedIn account? A: No. LinkedIn requires authentication to view search results and profiles.

Q: Does it work on Mac / Linux? A: Yes. Fully cross-platform on Windows, macOS, and Linux with Chrome, Edge, or Chromium.

Q: Can I run it without a browser window? A: Yes. Use --headless mode. But for the first run, use visible mode to handle any security challenges.


Alternatives Comparison

Feature LinkedIn Data Scraper LinkedIn API Manual Copy-Paste Paid Tools
Price Free Free (limited) Free $50-300/mo
Voyager API interception Yes N/A N/A Varies
15+ data fields Yes Rate limited Manual Yes
Excel with formatting Yes No No Basic
Open source Yes N/A N/A No
Web UI (Streamlit) Yes N/A N/A Yes
Docker ready Yes N/A N/A Varies

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Disclaimer

This tool is provided for educational and research purposes only. You are solely responsible for how you use this software. Scraping LinkedIn may violate their Terms of Service. By using this tool, you agree to:

  • Use it responsibly and ethically
  • Comply with all applicable laws and regulations
  • Not hold the authors liable for any consequences
  • Respect LinkedIn's rate limits and user privacy

License

MIT License - Copyright (c) 2022 Enzo Day


If this project helps you, please give it a star!
It helps others discover this tool.

Star this repo


Built with purpose by SoClose β€” Digital Innovation Through Automation & AI
Website β€’ LinkedIn β€’ Twitter β€’ Contact

About

πŸ”— Scrape LinkedIn profile data at scale β€” names, titles, companies, locations, CSV export. Python + Selenium.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages