Skip to content

πŸ₯ Scrape doctor data from Doctolib β€” names, specialties, addresses, availability. Python + Selenium. Free & open source.

License

Notifications You must be signed in to change notification settings

SoCloseSociety/DoctolibDataScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

12 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Doctolib Data Scraper

Extract doctor profiles, skills & contacts from Doctolib.fr β€” automated scraping with clean CSV export.

License: MIT Python 3.9+ Platform Selenium GitHub Stars Issues Forks

Quick Start β€’ Features β€’ Configuration β€’ FAQ β€’ Contributing


What is Doctolib Data Scraper?

Doctolib Data Scraper is a free, open-source Doctolib.fr web scraper built with Python and Selenium. It automates the extraction of doctor profiles from any Doctolib search URL into clean, analysis-ready CSV files.

Manually collecting doctor information from Doctolib is time-consuming. This scraper handles the entire process: give it a search URL, and it crawls all paginated results, then visits each profile to extract structured data β€” names, addresses, skills, degrees, and contact information.

Who is this for?

  • Healthcare Recruiters looking to build prospect lists of medical professionals
  • Market Researchers studying the healthcare landscape in France
  • Data Analysts collecting public health data for analysis
  • Startup Founders building healthcare-related products and services
  • Researchers studying medical specialization distribution
  • Developers learning web scraping with Selenium and Python

Key Features

  • Two-Phase Extraction - Phase 1 crawls paginated results, Phase 2 scrapes each profile
  • Full Pagination - Automatically navigates all search result pages
  • Multi-Location - Extracts every practice location per doctor
  • VPN Rotation - Built-in NordVPN CLI support to avoid rate limiting (optional)
  • Progressive Saving - Data saved every 5 profiles, no data loss on crash
  • Auto-Recovery - Handles connection drops with smart retry logic
  • Cross-Platform - Works on Windows, macOS, and Linux
  • Clean CSV Output - Ready for Excel, Google Sheets, or any data tool
  • Free & Open Source - MIT license, no API key required

Quick Start

Prerequisites

Requirement Details
Python Version 3.9 or higher (Download)
Google Chrome Latest version (Download)
NordVPN Optional β€” for IP rotation during large scrapes

Installation

# 1. Clone the repository
git clone https://github.com/SoCloseSociety/DoctolibDataScraper.git
cd DoctolibDataScraper

# 2. (Recommended) Create a virtual environment
python -m venv venv

# Activate it:
# Windows:
venv\Scripts\activate
# macOS / Linux:
source venv/bin/activate

# 3. Install dependencies
pip install -r requirements.txt

Usage

python main.py

Enter a Doctolib search URL when prompted:

============================================================
  DoctolibDataScraper
  by SoClose Society - https://soclose.co
============================================================

Enter Doctolib search URL: https://www.doctolib.fr/medecin-generaliste/paris

How It Works

Doctolib Search URL
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 1: Crawl     │──→ doctolib_profile_link.csv
β”‚  Paginated results  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
        β”‚
        β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Phase 2: Scrape    │──→ doctolib_profile_details.csv
β”‚  Each doctor profile β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

What It Extracts

Data Field Example
Name Dr. Marie Dupont
Addresses All practice locations with full addresses
Skills Medical specializations, competencies
Degrees Diplomas, certifications, education history
Contacts Phone numbers, additional contact details

Output Files

File Content
doctolib_profile_link.csv All unique doctor profile links
doctolib_profile_details.csv Full structured profile data
scraper.log Timestamped execution log

Configuration

NordVPN Setup (Optional)

VPN support helps avoid rate limiting during large scraping sessions.

Windows
  1. Install NordVPN
  2. Add NordVPN to your PATH: C:\Program Files\NordVPN\
  3. Verify in Command Prompt: nordvpn -c
macOS
  1. Install NordVPN via nordvpn.com or brew install nordvpn
  2. Verify in Terminal: nordvpn connect
Linux
  1. Install: sh <(curl -sSf https://downloads.nordcdn.com/apps/linux/install.sh)
  2. Login: nordvpn login
  3. Verify: nordvpn connect

Tech Stack

Technology Role
Python 3.9+ Core language
Selenium 4.15+ Browser automation & page interaction
BeautifulSoup4 HTML parsing & data extraction
Pandas Data structuring & CSV export
webdriver-manager Automatic ChromeDriver management

Project Structure

DoctolibDataScraper/
β”œβ”€β”€ main.py              # Main scraper application
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ assets/
β”‚   └── banner.svg       # Project banner
β”œβ”€β”€ LICENSE              # MIT License
β”œβ”€β”€ README.md            # This file
β”œβ”€β”€ CONTRIBUTING.md      # Contribution guidelines
└── .gitignore           # Git ignore rules

Troubleshooting

Chrome driver issues

The bot uses webdriver-manager to automatically download the correct ChromeDriver. If you encounter issues:

pip install --upgrade webdriver-manager

Rate limiting / IP blocks

If Doctolib blocks your requests:

  1. Enable NordVPN rotation (see Configuration)
  2. Increase delays between requests
  3. Reduce the number of profiles per session

Doctolib UI changes

Doctolib occasionally updates its web interface. If the scraper stops working:

  1. Check the Issues page for known problems
  2. Open a new issue with the error message

Permission denied errors (macOS/Linux)

chmod +x main.py

FAQ

Q: Is this free? A: Yes. Doctolib Data Scraper is 100% free and open source under the MIT license.

Q: Do I need an API key? A: No. This tool uses browser automation (Selenium), so no API key or developer account is needed.

Q: How many profiles can I scrape at once? A: There is no hard limit. The scraper processes profiles one by one with progressive saving. Just be mindful of Doctolib's usage policies and use VPN rotation for large scrapes.

Q: Does it comply with GDPR? A: The tool extracts publicly available data. You are responsible for handling any collected data in compliance with GDPR and applicable laws.

Q: Does it work on Mac / Linux? A: Yes. The scraper is fully cross-platform and works on Windows, macOS, and Linux.


Alternatives Comparison

Feature Doctolib Data Scraper Manual Copy-Paste Paid Scraping APIs
Price Free Free $50-200/mo
Automated pagination Yes No Yes
Multi-location scraping Yes Manual Varies
Open source Yes N/A No
API key required No No Yes
VPN rotation Built-in N/A Varies
Cross-platform Yes Yes Web only

Contributing

Contributions are welcome! Please read the Contributing Guide before submitting a pull request.


License

This project is licensed under the MIT License.


Disclaimer

This tool is provided for educational and research purposes only. Use it responsibly and in compliance with Doctolib's Terms of Service and applicable data protection laws (GDPR). The authors are not responsible for any misuse or consequences arising from the use of this software.


If this project helps you, please give it a star!
It helps others discover this tool.

Star this repo


Built with purpose by SoClose β€” Digital Innovation Through Automation & AI
Website β€’ LinkedIn β€’ Twitter β€’ Contact

About

πŸ₯ Scrape doctor data from Doctolib β€” names, specialties, addresses, availability. Python + Selenium. Free & open source.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages