datacol.py - Event Participant Data Collector

A Python script that scrapes participant information from event websites and exports it to CSV. Built with Selenium and BeautifulSoup, with built-in support for n8n automation.

Features

Automatic participant extraction from web pages using .card selectors
Interactive field selection – choose which data fields to export
Selective participant filtering – include specific participants or all
CSV export – clean, formatted output ready for spreadsheets or databases
Headless browser support – runs without opening a visible browser window
n8n ready – can be integrated into automated workflows (see below)

Requirements

System Dependencies

Python 3.6+
Chrome or Chromium browser
ChromeDriver (matching your browser version)

Python Packages

pip install selenium beautifulsoup4

Installation

Clone or download this script to your machine

Install ChromeDriver (if not already installed):

# Ubuntu/Debian
sudo apt install chromium-chromedriver

# Or download manually from: https://chromedriver.chromium.org/

Update paths in the script if needed:
- options.binary_location – path to your Chrome/Chromium executable
- Service("/usr/local/bin/chromedriver") – path to your ChromeDriver

Usage

Basic Usage

Run the script and follow the interactive prompts:

python datacol.py

You will be guided through:

Enter URL – the event page containing participant cards
Select participants – choose specific numbers or type all
Choose fields – select which data fields to export
CSV generated – results saved to participants_selected.csv

Example Output

Enter the full event participants page URL: https://example.com/event/speakers
Loading https://example.com/event/speakers ...

✅ Found 24 participant cards.

1. Dr. Sarah Chen
2. Prof. Michael Rodriguez
3. Dr. Emily Watson
...

Enter participant numbers to include (comma separated, or 'all'): 1,3,5-7
Selected 5 participants.

Detected available fields:
1. Name
2. Title
3. Company
4. Country

Enter field number to include (press Enter to finish): 1
Added: Name
Enter field number to include (press Enter to finish): 2
Added: Title
Enter field number to include (press Enter to finish): 

✅ Extraction complete: participants_selected.csv

Customizing for Different Websites

The script is designed for pages where participant cards use the .card CSS class. To adapt it for other websites:

Element	Where to Change	Example
Card selector	`soup.select(".card")`	Change `.card` to `.participant`, `.speaker-item`, etc.
Name selector	`card.find("h3")`	Change to `h2`, `.name`, `[class*="name"]`
Field extraction	`card.find_all("p")`	Change to `.details`, `div.info`, etc.

n8n Integration

This script can be integrated into n8n workflows for fully automated data collection. Here are three approaches:

Approach 1: Execute Command Node (Simple)

Run the script directly from n8n using the Execute Command node:

Workflow:

[Schedule Trigger] → [Execute Command] → [Read File] → [Google Drive/Send Email]

Execute Command Node Configuration:

Command: python
Arguments: /path/to/datacol.py
Note: For non-interactive use, you'll need to modify the script to accept arguments (see Approach 2)

Approach 2: Modified Version with CLI Arguments

Create a modified version of the script that accepts command-line arguments:

# datacol_cli.py - Modified for n8n
import argparse
import csv
from bs4 import BeautifulSoup
from selenium import webdriver
# ... (rest of imports)

parser = argparse.ArgumentParser()
parser.add_argument("--url", required=True, help="Event page URL")
parser.add_argument("--participants", default="all", help="Participant numbers or 'all'")
parser.add_argument("--fields", required=True, help="Comma-separated field names")
parser.add_argument("--output", default="participants_selected.csv", help="Output file")

args = parser.parse_args()
# ... (rest of script logic)

Then in n8n, use:

python datacol_cli.py --url "https://example.com/event" --fields "Name,Title,Company" --output "/tmp/participants.csv"

Approach 3: HTTP API Wrapper (Most Flexible)

Create a simple Flask API wrapper that n8n can call via HTTP Request node:

# datacol_api.py
from flask import Flask, request, jsonify
import subprocess
import json

app = Flask(__name__)

@app.route('/scrape', methods=['POST'])
def scrape():
    data = request.json
    url = data.get('url')
    fields = data.get('fields', [])
    
    result = subprocess.run(
        ['python', 'datacol_cli.py', '--url', url, '--fields', ','.join(fields)],
        capture_output=True, text=True
    )
    
    with open('participants_selected.csv', 'r') as f:
        csv_data = f.read()
    
    return jsonify({'status': 'success', 'data': csv_data})

if __name__ == '__main__':
    app.run(port=5000)

n8n HTTP Request Node:

Method: POST
URL: http://localhost:5000/scrape

Body (JSON):

{
  "url": "https://example.com/event",
  "fields": ["Name", "Title", "Company"]
}

Approach 4: Complete n8n Workflow (No Python Needed)

As an alternative, you can skip the Python script entirely and build the entire scraper in n8n using built-in nodes. This approach eliminates the need for Python dependencies and browser drivers.

Example n8n Workflow Structure

[Schedule Trigger] 
  ↓ (runs every Monday at 9 AM)
[Set Node] 
  - Set URL: "https://example.com/event/participants"
  - Set fields: ["Name", "Title", "Company"]
  ↓
[Execute Command] 
  - Run: python /path/to/datacol_cli.py
  - Arguments: --url "{{$json.url}}" --fields "{{$json.fields.join(',')}}"
  ↓
[Read File] 
  - File path: /tmp/participants.csv
  ↓
[Convert to JSON] 
  - Parse CSV to structured data
  ↓
[Google Sheets] 
  - Append data to spreadsheet

Troubleshooting

ChromeDriver Issues

If you get errors about ChromeDriver:

# Check ChromeDriver version
chromedriver --version

# Update if needed
sudo apt update && sudo apt upgrade chromium-chromedriver

No Cards Found

If the script finds 0 cards:

Check that the page uses .card class for participant elements
Try opening the page manually and inspecting element selectors
The page may require login or may load content dynamically with JavaScript

Headless Mode Problems

If pages don't load properly in headless mode:

Remove options.add_argument("--headless") to see what's happening
Add time.sleep(5) to give more time for JavaScript to execute

File Structure

datacol.py               # Main script
participants_selected.csv # Generated output file
datacol_cli.py          # CLI version for n8n (optional)
datacol_api.py          # API wrapper for n8n (optional)

License

MIT License

Contributing

Feel free to submit issues or pull requests for:

Additional website selectors
Better error handling
Performance improvements
Enhanced n8n integration examples

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
datacol.py		datacol.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

datacol.py - Event Participant Data Collector

Features

Requirements

System Dependencies

Python Packages

Installation

Usage

Basic Usage

Example Output

Customizing for Different Websites

n8n Integration

Approach 1: Execute Command Node (Simple)

Approach 2: Modified Version with CLI Arguments

Approach 3: HTTP API Wrapper (Most Flexible)

Approach 4: Complete n8n Workflow (No Python Needed)

Example n8n Workflow Structure

Troubleshooting

ChromeDriver Issues

No Cards Found

Headless Mode Problems

File Structure

License

Contributing

Related

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

datacol.py - Event Participant Data Collector

Features

Requirements

System Dependencies

Python Packages

Installation

Usage

Basic Usage

Example Output

Customizing for Different Websites

n8n Integration

Approach 1: Execute Command Node (Simple)

Approach 2: Modified Version with CLI Arguments

Approach 3: HTTP API Wrapper (Most Flexible)

Approach 4: Complete n8n Workflow (No Python Needed)

Example n8n Workflow Structure

Troubleshooting

ChromeDriver Issues

No Cards Found

Headless Mode Problems

File Structure

License

Contributing

Related

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages