Centralized, searchable schedules for San Francisco public swimming pools. This app scrapes official pool schedule PDFs from SF Rec & Park, uses an LLM to extract structured schedules, and provides a clean UI to browse by program, day, time, and pool.
- Next.js (App Router), React 19, TypeScript
- Tailwind CSS v4 (via
@tailwindcss/postcssand@import "tailwindcss") - Vercel AI SDK (
ai) with Google Generative AI provider (@ai-sdk/google) - Zod for strict schema validation
- GitHub Actions for automated weekly schedule updates
- Node.js 22+
- npm
- A Google Generative AI API key
-
Install dependencies:
npm install -
Create
.env.localin the project root and add:GOOGLE_GENERATIVE_AI_API_KEY=your_key_here -
Run the dev server:
npm run dev -
Generate schedules:
npm run build-schedules
This will scrape PDF URLs, download PDFs, extract schedules, and write public/data/all_schedules.json. View at /schedules.
data/pools.json— Source of truth for static pool metadata (id, name, shortName, address, pageUrl). Does not change frequently.public/data/discovered_pool_schedules.json— Scraped PDF URLs (poolId → pdfUrl mapping). Regenerated on each scrape.data/pdf-manifest.json— Tracks downloaded PDFs by hash to detect changes.data/extracted/<poolId>.json— Cached LLM extractions per PDF.public/data/all_schedules.json— Aggregated schedule data for the UI.data/changelog/— Change history between schedule updates.
scrape → download-pdfs → process-all-pdfs
↓ ↓ ↓
validates checks hash preserves data
pool count downloads if fails on large
& URLs content changed changes
- Scrape: Discovers pool pages and PDF URLs from SF Rec & Park. Validates against
pools.json— fails if pool count or page URLs change unexpectedly. - Download: Fetches PDFs, checks content hash against manifest. Only downloads if content actually changed (handles URL changes gracefully).
- Process: Extracts schedules via LLM, preserves unchanged pool schedules, detects large changes.
The pipeline computes a changelog comparing old vs new schedules:
- none/minor: Normal updates, build succeeds
- major/wholesale: Large changes detected, build fails in CI (requires manual review)
Set FAIL_ON_LARGE_CHANGES=false locally to bypass this check during development.
npm run dev— start Next.js dev servernpm run build— build for productionnpm run start— start production buildnpm run lint— run ESLintnpm run test— run testsnpm run scrape— scrape pool pages, validate against pools.json, discover PDF URLsnpm run download-pdfs— download changed PDFs intodata/pdfs/npm run process-all-pdfs— extract schedules from PDFs, preserve unchanged, write changelognpm run build-schedules— full pipeline: scrape → download → processnpm run scrape-alerts— scrape pool alerts from SF Rec & Parknpm run analyze-programs— analyze raw vs canonical program names
- For each PDF, send content to the LLM with a strict Zod schema
- The model extracts:
- Pool metadata (name, season, date range)
- Program entries with day, time, lanes, notes
- Pipeline normalizes program names to canonical labels (e.g., "LAP SWIM" → "Lap Swim")
- Enriches with static metadata from
pools.json(address, URLs)
GitHub Actions runs weekly to:
- Scrape and download new PDFs
- Extract schedules from changed PDFs
- Commit changes to
public/data/anddata/changelog/ - Send push notifications via Pushover
If large changes are detected, the build fails and a notification is sent for manual review.
# Required: Google AI API key for schedule extraction
GOOGLE_GENERATIVE_AI_API_KEY=your_key_here
# Optional: Disable build failure on large changes (for local dev)
FAIL_ON_LARGE_CHANGES=false
# Optional: Force re-extraction even if cache exists
REFRESH_EXTRACT=1
# For CI notifications (GitHub Actions secrets)
PUSHOVER_USER_KEY=...
PUSHOVER_API_TOKEN=...MIT