Automated pipeline for generating property leads from Pennsylvania obituaries and public notices.
- Scrape: Daily monitoring of Legacy.com and PA Public Notice sites
- Validate: Automated search across county tax assessor portals (DevNet, GIS, ArcGIS)
- Enrich: Skip tracing via BatchData API
- Deliver: Automatic sync to Google Sheets
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
playwright install chromiumcp .env.example .env
# Edit .env with your API keys- Create a Google Cloud project
- Enable the Google Sheets API
- Create a service account and download JSON credentials
- Save as
config/google_credentials.json - Share your target spreadsheet with the service account email
Edit config/counties.yaml to add/modify county portal configurations.
python main.py run# Scrape obituaries and public notices
python main.py scrape
# Validate scraped leads against tax portals
python main.py validate
# Enrich validated leads with skip tracing
python main.py enrich
# Sync to Google Sheets
python main.py syncpython main.py run --county allegheny # Run for specific county
python main.py run --days 7 # Process last 7 days
python main.py run --dry-run # Preview without changespa-property-pipeline/
├── config/ # Configuration files
├── scrapers/ # Web scrapers
├── cleaners/ # Data cleaning utilities
├── validators/ # County portal validators
├── enrichers/ # Skip trace integration
├── delivery/ # Google Sheets sync
├── models/ # Data models
├── utils/ # Shared utilities
├── logs/ # Runtime logs
├── data/ # Local data storage
└── tests/ # Test suite