A PGA Tour data exploration + prediction terminal. Five tabs:
- Learning — interactive math: shot dispersion, strokes-gained breakdowns, expected-value tradeoffs
- PGA Tour — course timeline, k-means course clusters, vol cone for shot dispersion, walk-forward predictive backtests
- Predictions — cut model + finish-position model trained on multi-season data; live what-if scoring
- 3D — Three.js course visualization: flythroughs, ball-flight curvature simulations, club-selection scenes (1300 lines of WebGL)
- Career Simulator — Path-to-Tour Monte Carlo: Korn Ferry → PGA promotion probabilities given a starting skill profile
Standalone version of the Golf Data Lab app from w1zz7/willos-98-portfolio. Same code, no Win98 desktop chrome — just the lab, fullscreen.
A teaching interface aimed at curious golfers + analysts who want to understand why a 6-handicap should pick the safe line and a tour pro should pick the aggressive one. Visualizations:
- Shot dispersion ellipses — variance + skewness of where shots actually land vs intended target, by club + skill level
- Strokes-gained breakdown — Mark Broadie's 4-bucket SG framework (off-the-tee, approach, around-the-green, putting) with live tooltips on every concept
- Expected-value calculator — pin-vs-fat-of-green decision under uncertainty: integrate dispersion ellipse over the green's penalty surface, get the EV in shots per round
- Risk-reward curve — for every dispersion radius (your hands' precision), what's the optimal target line? At what handicap does the optimum flip?
Multi-season aggregation of PGA Tour course data (data/golfdata/pga_*.json):
- Course timeline — every PGA event 2018–present with course rating, scoring average, cut line, winner score
- K-means course clusters — 6-cluster solution over course features (length, par-3 difficulty, putting surface speed, water/sand hazard ratios). Each cluster has a "vibe" (links style, parkland, desert, classic, modern, manufactured)
- Vol cone — per-cluster shot dispersion as a function of distance, computed from millions of ShotLink rows. Reveals the "scoring distance" sweet spot (~110-140 yds for tour pros)
- Walk-forward backtest — train a finish-position model on 2018-2022, test on 2023, retrain through 2023, test on 2024, etc. Reports per-window MAE / R² so you can see the model degrading or improving over time
- K-means diagnostics — silhouette scores, within-cluster sum of squares, elbow analysis to justify k=6
Two production-grade models trained offline (scripts/prep-golf-data.mjs builds the training data):
| Model | Architecture | Output |
|---|---|---|
| Cut model | Logistic regression with engineered features (course-cluster fixed effects, recent form decay, head-to-head SG vs field) | P(makes cut) ∈ [0, 1] |
| Finish model | Multinomial Naive Bayes over discretized SG buckets, calibrated against historical finish distribution | P(finish in top 5) / top 10 / top 25 / cuts |
Both models are exported as plain JSON (weights + scaling parameters) at data/golfdata/cut_model.json and data/golfdata/finish_model.json. The frontend loads them at boot and runs predictions client-side — no inference server.
Live what-if — slide a player's input SG values, watch the predicted cut probability + finish distribution update in real time.
A 1300-line WebGL view rendering:
- Course flythrough — programmatically generated terrain (heightmap noise + texture splatting) styled as a Pebble-Beach-esque oceanside par-4. Free-camera mode (orbit / pan / zoom) and a guided "tour the hole" mode that camera-paths from tee to green
- Ball-flight simulation — physics integrator over launch conditions (clubhead speed, attack angle, spin, wind). Renders the shot trajectory as a parametric curve in 3D space. You can move sliders and watch the trajectory bend in real time
- Club selection scene — overhead view of the hole with dispersion ellipses for every club, scaled to the player's skill profile. Pick a club → see if your ellipse covers the green or bleeds into hazards
Built with @react-three/fiber + @react-three/drei + tween.js. Performant on integrated GPUs (60 fps on M1 Air, 30+ fps on a 5-year-old MacBook Pro).
A Monte Carlo simulator for the Korn Ferry → PGA Tour promotion path:
- Input a starting skill profile: SG-OTT, SG-APP, SG-ARG, SG-PUTT (relative to PGA average)
- 5,000 simulated seasons, each season rolls 25 events with random course assignments
- For each event: sample dispersion from cluster-specific vol cones, predict finish via the trained model, accumulate FedExCup / Korn Ferry points
- Aggregate: P(make Korn Ferry top 25) → P(promote to PGA) → P(stay on PGA) → 5-year career trajectory tree
- Visualization: stacked area chart of "what fraction of the simulated 5,000 seasons are at career stage X by year Y"
The data layer is built offline by scripts/prep-golf-data.mjs (1000-line pipeline). It:
- Pulls historical PGA event/round/shot data from public sources
- Joins ShotLink-style metrics with course metadata
- Runs k-means clustering on course features
- Computes vol cones (dispersion as a function of distance, per cluster)
- Trains cut + finish models with regularized cross-validation
- Walks forward through the holdout windows to verify out-of-sample performance
- Exports everything to small JSON files in
data/golfdata/
The frontend is a pure consumer — it reads the JSON, runs predictions client-side, renders the visualizations. No inference server, no API calls, no rate limits. The whole site is fully static after the initial JSON load.
| File | Size | Content |
|---|---|---|
pga_tour.json |
~30 KB | High-level event metadata 2018-present |
pga_courses_deep.json |
~80 KB | Per-course feature vectors (length, par-3 difficulty, hazard ratios) |
pga_courses_timeline.json |
~40 KB | Per-course year-over-year scoring trends |
pga_kmeans_diagnostics.json |
~5 KB | Silhouette / WCSS / elbow data justifying k=6 |
pga_cluster_timeline.json |
~20 KB | Cluster popularity over time |
pga_vol_cone.json |
~15 KB | Distance-binned dispersion stats per cluster |
pga_walkforward.json |
~8 KB | Per-window MAE/R² for the rolling backtest |
pga_strategy_panel.json |
~25 KB | Cross-tab: skill profile × course type → recommended strategy |
pga_majors.json |
~10 KB | Majors-specific subset for special handling |
pga_player_course.json |
~120 KB | Player×course historical performance for the live what-if scoring |
pga_career_paths.json |
~50 KB | Korn Ferry → PGA transition probabilities (the Monte Carlo input) |
pga_analysis.json |
~20 KB | Aggregate stats for the Learning tab |
cut_model.json |
~8 KB | Logistic regression weights + scaling |
finish_model.json |
~12 KB | Multinomial NB parameters |
model.json |
~6 KB | Catch-all model registry |
play_surface.json |
~3 KB | Per-course green type / Stimpmeter speed |
eda.json |
~15 KB | Exploratory data analysis results |
pca.json |
~10 KB | PCA loadings for course-feature reduction |
scatter3d.json |
~50 KB | Pre-computed 3D point cloud for the cluster visualization |
Total: ~530 KB JSON, all loaded once at boot. No external API calls.
npm install
npm run devOpen http://localhost:3000. The lab renders fullscreen.
| Script | What it does |
|---|---|
npm run dev |
Next.js dev server (port 3000) |
npm run build |
Production build |
npm run start |
Run the production build locally |
npm run typecheck |
tsc --noEmit (0 errors expected) |
node scripts/prep-golf-data.mjs |
Re-build the JSON data pack from raw sources (offline) |
Everything is static after the JSON load. No API keys, no third-party SaaS, nothing to configure.
netlify.toml pins Node 20 LTS and adds long-cache headers on /_next/static/* (the JSON data pack is content-hashed by Next, so it's safe to cache aggressively).
npx netlify-cli login
npx netlify-cli init # link this folder to a new or existing site
npx netlify-cli deploy --prodOr via the Netlify dashboard:
- Add new site → Import from Git → choose this repo
- Build settings auto-detect from
netlify.toml(build cmdnpm run build, publish.next) - No environment variables needed — the lab is fully static after the JSON pack loads
- Deploy. Subsequent pushes to
mainauto-deploy.
npx vercel --prodNo vercel.json needed — Next.js conventions are auto-detected.
Standard Next.js — Railway, Render, Fly all work without changes.
- First Load JS: 659 KB (gzipped)
- ~400 KB of that is Three.js + r3f + drei (the 3D tab)
- ~150 KB of that is the React + Next.js runtime
- The rest is component code + the embedded JSON data
The 3D tab is lazy-loaded — initial page load doesn't pay the Three.js cost until the user clicks the 3D sub-tab.
golf-data-lab/
├── app/
│ ├── layout.tsx # root layout (no Win98 chrome)
│ ├── page.tsx # mounts <GolfDataLab /> fullscreen
│ └── globals.css # tailwind + base styles
│
├── components/apps/golfdatalab/
│ ├── GolfDataLab.tsx # 5-tab container + tab bar
│ ├── LearningTab.tsx # interactive math & SG framework
│ ├── PgaTourTab.tsx # course timeline + clusters + walk-forward
│ ├── PredictionsTab.tsx # cut + finish model live scoring
│ ├── ThreeDTab.tsx # Three.js course scenes (1300 lines)
│ ├── CareerSimulator.tsx # Korn Ferry → PGA Monte Carlo
│ ├── StrategyLab.tsx # course-cluster × skill-profile strategy panel
│ └── SgGlossaryModal.tsx # Mark Broadie's SG framework explainer
│
├── data/golfdata/ # 19 JSON files, ~530 KB total
│ ├── cut_model.json # logistic regression weights
│ ├── finish_model.json # multinomial NB parameters
│ ├── pga_tour.json # event metadata
│ ├── pga_courses_deep.json # course feature vectors
│ ├── pga_kmeans_diagnostics.json
│ ├── pga_vol_cone.json # dispersion-by-distance
│ ├── pga_walkforward.json # per-window backtest results
│ ├── pga_career_paths.json # Monte Carlo transition probabilities
│ └── ... (13 more)
│
├── scripts/
│ └── prep-golf-data.mjs # 1000-line offline data pipeline
│
└── lib/wm/types.ts # minimal WindowState stub
All the analytics use Mark Broadie's Strokes Gained framework (Stanford, "Every Shot Counts"). The 4-bucket attribution:
- SG-OTT (off-the-tee) — drives + tee shots on par-4s and par-5s
- SG-APP (approach) — every shot from the fairway/rough that's not from <30 yds
- SG-ARG (around-the-green) — chips, pitches, bunker shots inside 30 yds
- SG-PUTT — every putt
The benchmark is the field's expected score from each starting point — improvement over benchmark = strokes gained vs the field. This is the same framework the PGA Tour publishes officially since 2014.
K-means on 6 features (post-PCA): course length normalized to par, par-3 difficulty index, water hazard ratio, sand hazard ratio, green speed (Stimpmeter), elevation change. K chosen at 6 via elbow + silhouette analysis (in pga_kmeans_diagnostics.json).
Standard time-series CV. Train window 2018–2022, test 2023. Then re-fit including 2023, test 2024. Reports per-window R² + MAE so you can verify the model isn't overfitting to a single regime.
5,000 trials × 25 events × 5 years = 625K simulated tournaments. Per-event randomness:
- Sample course assignment from the actual season schedule
- Sample dispersion from the cluster-specific vol cone (heteroscedastic)
- Predict finish via the trained finish model, conditional on the player's SG profile
- Accumulate Korn Ferry / FedExCup points per the official scoring rules
- End-of-year promotion / demotion thresholds applied
Output: stacked-area chart of "what fraction of trials are at career-stage X by year Y" (Korn Ferry full status / partial status / promoted to PGA / demoted / off-Tour).
MIT. Use this code freely.
The PGA Tour event data in data/golfdata/*.json was assembled from public sources for educational purposes. Replace with your own pipeline output for production use.
Designed + engineered by Will Zhang as part of the WillOS 98 Portfolio.
Methodology: Mark Broadie's Every Shot Counts (Strokes Gained framework). Course geometry / WebGL: Three.js + react-three-fiber + drei.
Built with Next.js 15, React 19, TypeScript 5, Tailwind CSS 4, Three.js r170.