Extract doctor profiles, skills & contacts from Doctolib.fr β automated scraping with clean CSV export.
Quick Start β’ Features β’ Configuration β’ FAQ β’ Contributing
Doctolib Data Scraper is a free, open-source Doctolib.fr web scraper built with Python and Selenium. It automates the extraction of doctor profiles from any Doctolib search URL into clean, analysis-ready CSV files.
Manually collecting doctor information from Doctolib is time-consuming. This scraper handles the entire process: give it a search URL, and it crawls all paginated results, then visits each profile to extract structured data β names, addresses, skills, degrees, and contact information.
- Healthcare Recruiters looking to build prospect lists of medical professionals
- Market Researchers studying the healthcare landscape in France
- Data Analysts collecting public health data for analysis
- Startup Founders building healthcare-related products and services
- Researchers studying medical specialization distribution
- Developers learning web scraping with Selenium and Python
- Two-Phase Extraction - Phase 1 crawls paginated results, Phase 2 scrapes each profile
- Full Pagination - Automatically navigates all search result pages
- Multi-Location - Extracts every practice location per doctor
- VPN Rotation - Built-in NordVPN CLI support to avoid rate limiting (optional)
- Progressive Saving - Data saved every 5 profiles, no data loss on crash
- Auto-Recovery - Handles connection drops with smart retry logic
- Cross-Platform - Works on Windows, macOS, and Linux
- Clean CSV Output - Ready for Excel, Google Sheets, or any data tool
- Free & Open Source - MIT license, no API key required
| Requirement | Details |
|---|---|
| Python | Version 3.9 or higher (Download) |
| Google Chrome | Latest version (Download) |
| NordVPN | Optional β for IP rotation during large scrapes |
# 1. Clone the repository
git clone https://github.com/SoCloseSociety/DoctolibDataScraper.git
cd DoctolibDataScraper
# 2. (Recommended) Create a virtual environment
python -m venv venv
# Activate it:
# Windows:
venv\Scripts\activate
# macOS / Linux:
source venv/bin/activate
# 3. Install dependencies
pip install -r requirements.txtpython main.pyEnter a Doctolib search URL when prompted:
============================================================
DoctolibDataScraper
by SoClose Society - https://soclose.co
============================================================
Enter Doctolib search URL: https://www.doctolib.fr/medecin-generaliste/paris
Doctolib Search URL
β
βΌ
βββββββββββββββββββββββ
β Phase 1: Crawl ββββ doctolib_profile_link.csv
β Paginated results β
βββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββ
β Phase 2: Scrape ββββ doctolib_profile_details.csv
β Each doctor profile β
βββββββββββββββββββββββ
| Data Field | Example |
|---|---|
| Name | Dr. Marie Dupont |
| Addresses | All practice locations with full addresses |
| Skills | Medical specializations, competencies |
| Degrees | Diplomas, certifications, education history |
| Contacts | Phone numbers, additional contact details |
| File | Content |
|---|---|
doctolib_profile_link.csv |
All unique doctor profile links |
doctolib_profile_details.csv |
Full structured profile data |
scraper.log |
Timestamped execution log |
VPN support helps avoid rate limiting during large scraping sessions.
Windows
- Install NordVPN
- Add NordVPN to your PATH:
C:\Program Files\NordVPN\ - Verify in Command Prompt:
nordvpn -c
macOS
- Install NordVPN via nordvpn.com or
brew install nordvpn - Verify in Terminal:
nordvpn connect
Linux
- Install:
sh <(curl -sSf https://downloads.nordcdn.com/apps/linux/install.sh) - Login:
nordvpn login - Verify:
nordvpn connect
| Technology | Role |
|---|---|
| Python 3.9+ | Core language |
| Selenium 4.15+ | Browser automation & page interaction |
| BeautifulSoup4 | HTML parsing & data extraction |
| Pandas | Data structuring & CSV export |
| webdriver-manager | Automatic ChromeDriver management |
DoctolibDataScraper/
βββ main.py # Main scraper application
βββ requirements.txt # Python dependencies
βββ assets/
β βββ banner.svg # Project banner
βββ LICENSE # MIT License
βββ README.md # This file
βββ CONTRIBUTING.md # Contribution guidelines
βββ .gitignore # Git ignore rules
The bot uses webdriver-manager to automatically download the correct ChromeDriver. If you encounter issues:
pip install --upgrade webdriver-managerIf Doctolib blocks your requests:
- Enable NordVPN rotation (see Configuration)
- Increase delays between requests
- Reduce the number of profiles per session
Doctolib occasionally updates its web interface. If the scraper stops working:
- Check the Issues page for known problems
- Open a new issue with the error message
chmod +x main.pyQ: Is this free? A: Yes. Doctolib Data Scraper is 100% free and open source under the MIT license.
Q: Do I need an API key? A: No. This tool uses browser automation (Selenium), so no API key or developer account is needed.
Q: How many profiles can I scrape at once? A: There is no hard limit. The scraper processes profiles one by one with progressive saving. Just be mindful of Doctolib's usage policies and use VPN rotation for large scrapes.
Q: Does it comply with GDPR? A: The tool extracts publicly available data. You are responsible for handling any collected data in compliance with GDPR and applicable laws.
Q: Does it work on Mac / Linux? A: Yes. The scraper is fully cross-platform and works on Windows, macOS, and Linux.
| Feature | Doctolib Data Scraper | Manual Copy-Paste | Paid Scraping APIs |
|---|---|---|---|
| Price | Free | Free | $50-200/mo |
| Automated pagination | Yes | No | Yes |
| Multi-location scraping | Yes | Manual | Varies |
| Open source | Yes | N/A | No |
| API key required | No | No | Yes |
| VPN rotation | Built-in | N/A | Varies |
| Cross-platform | Yes | Yes | Web only |
Contributions are welcome! Please read the Contributing Guide before submitting a pull request.
This project is licensed under the MIT License.
This tool is provided for educational and research purposes only. Use it responsibly and in compliance with Doctolib's Terms of Service and applicable data protection laws (GDPR). The authors are not responsible for any misuse or consequences arising from the use of this software.
If this project helps you, please give it a star!
It helps others discover this tool.
Built with purpose by SoClose β Digital Innovation Through Automation & AI
Website β’
LinkedIn β’
Twitter β’
Contact