A Python script that scrapes participant information from event websites and exports it to CSV. Built with Selenium and BeautifulSoup, with built-in support for n8n automation.
- Automatic participant extraction from web pages using
.cardselectors - Interactive field selection – choose which data fields to export
- Selective participant filtering – include specific participants or all
- CSV export – clean, formatted output ready for spreadsheets or databases
- Headless browser support – runs without opening a visible browser window
- n8n ready – can be integrated into automated workflows (see below)
- Python 3.6+
- Chrome or Chromium browser
- ChromeDriver (matching your browser version)
pip install selenium beautifulsoup4-
Clone or download this script to your machine
-
Install ChromeDriver (if not already installed):
# Ubuntu/Debian sudo apt install chromium-chromedriver # Or download manually from: https://chromedriver.chromium.org/
-
Update paths in the script if needed:
options.binary_location– path to your Chrome/Chromium executableService("/usr/local/bin/chromedriver")– path to your ChromeDriver
Run the script and follow the interactive prompts:
python datacol.pyYou will be guided through:
- Enter URL – the event page containing participant cards
- Select participants – choose specific numbers or type
all - Choose fields – select which data fields to export
- CSV generated – results saved to
participants_selected.csv
Enter the full event participants page URL: https://example.com/event/speakers
Loading https://example.com/event/speakers ...
✅ Found 24 participant cards.
1. Dr. Sarah Chen
2. Prof. Michael Rodriguez
3. Dr. Emily Watson
...
Enter participant numbers to include (comma separated, or 'all'): 1,3,5-7
Selected 5 participants.
Detected available fields:
1. Name
2. Title
3. Company
4. Country
Enter field number to include (press Enter to finish): 1
Added: Name
Enter field number to include (press Enter to finish): 2
Added: Title
Enter field number to include (press Enter to finish):
✅ Extraction complete: participants_selected.csv
The script is designed for pages where participant cards use the .card CSS class. To adapt it for other websites:
| Element | Where to Change | Example |
|---|---|---|
| Card selector | soup.select(".card") |
Change .card to .participant, .speaker-item, etc. |
| Name selector | card.find("h3") |
Change to h2, .name, [class*="name"] |
| Field extraction | card.find_all("p") |
Change to .details, div.info, etc. |
This script can be integrated into n8n workflows for fully automated data collection. Here are three approaches:
Run the script directly from n8n using the Execute Command node:
Workflow:
[Schedule Trigger] → [Execute Command] → [Read File] → [Google Drive/Send Email]
Execute Command Node Configuration:
- Command:
python - Arguments:
/path/to/datacol.py - Note: For non-interactive use, you'll need to modify the script to accept arguments (see Approach 2)
Create a modified version of the script that accepts command-line arguments:
# datacol_cli.py - Modified for n8n
import argparse
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
# ... (rest of imports)
parser = argparse.ArgumentParser()
parser.add_argument("--url", required=True, help="Event page URL")
parser.add_argument("--participants", default="all", help="Participant numbers or 'all'")
parser.add_argument("--fields", required=True, help="Comma-separated field names")
parser.add_argument("--output", default="participants_selected.csv", help="Output file")
args = parser.parse_args()
# ... (rest of script logic)Then in n8n, use:
python datacol_cli.py --url "https://example.com/event" --fields "Name,Title,Company" --output "/tmp/participants.csv"Create a simple Flask API wrapper that n8n can call via HTTP Request node:
# datacol_api.py
from flask import Flask, request, jsonify
import subprocess
import json
app = Flask(__name__)
@app.route('/scrape', methods=['POST'])
def scrape():
data = request.json
url = data.get('url')
fields = data.get('fields', [])
result = subprocess.run(
['python', 'datacol_cli.py', '--url', url, '--fields', ','.join(fields)],
capture_output=True, text=True
)
with open('participants_selected.csv', 'r') as f:
csv_data = f.read()
return jsonify({'status': 'success', 'data': csv_data})
if __name__ == '__main__':
app.run(port=5000)n8n HTTP Request Node:
- Method: POST
- URL:
http://localhost:5000/scrape - Body (JSON):
{ "url": "https://example.com/event", "fields": ["Name", "Title", "Company"] }
As an alternative, you can skip the Python script entirely and build the entire scraper in n8n using built-in nodes. This approach eliminates the need for Python dependencies and browser drivers.
[Schedule Trigger]
↓ (runs every Monday at 9 AM)
[Set Node]
- Set URL: "https://example.com/event/participants"
- Set fields: ["Name", "Title", "Company"]
↓
[Execute Command]
- Run: python /path/to/datacol_cli.py
- Arguments: --url "{{$json.url}}" --fields "{{$json.fields.join(',')}}"
↓
[Read File]
- File path: /tmp/participants.csv
↓
[Convert to JSON]
- Parse CSV to structured data
↓
[Google Sheets]
- Append data to spreadsheet
If you get errors about ChromeDriver:
# Check ChromeDriver version
chromedriver --version
# Update if needed
sudo apt update && sudo apt upgrade chromium-chromedriverIf the script finds 0 cards:
- Check that the page uses
.cardclass for participant elements - Try opening the page manually and inspecting element selectors
- The page may require login or may load content dynamically with JavaScript
If pages don't load properly in headless mode:
- Remove
options.add_argument("--headless")to see what's happening - Add
time.sleep(5)to give more time for JavaScript to execute
datacol.py # Main script
participants_selected.csv # Generated output file
datacol_cli.py # CLI version for n8n (optional)
datacol_api.py # API wrapper for n8n (optional)
MIT License
Feel free to submit issues or pull requests for:
- Additional website selectors
- Better error handling
- Performance improvements
- Enhanced n8n integration examples
- n8n Documentation – For building automation workflows
- Selenium Documentation – For web automation
- BeautifulSoup Documentation – For HTML parsing