Skip to content

w1zz7/golf-data-lab

Repository files navigation

Golf Data Lab

Next.js 15 React 19 TypeScript Three.js License: MIT

A PGA Tour data exploration + prediction terminal. Five tabs:

  • Learning — interactive math: shot dispersion, strokes-gained breakdowns, expected-value tradeoffs
  • PGA Tour — course timeline, k-means course clusters, vol cone for shot dispersion, walk-forward predictive backtests
  • Predictions — cut model + finish-position model trained on multi-season data; live what-if scoring
  • 3D — Three.js course visualization: flythroughs, ball-flight curvature simulations, club-selection scenes (1300 lines of WebGL)
  • Career Simulator — Path-to-Tour Monte Carlo: Korn Ferry → PGA promotion probabilities given a starting skill profile

Standalone version of the Golf Data Lab app from w1zz7/willos-98-portfolio. Same code, no Win98 desktop chrome — just the lab, fullscreen.


What it does

1. Learning (interactive math)

A teaching interface aimed at curious golfers + analysts who want to understand why a 6-handicap should pick the safe line and a tour pro should pick the aggressive one. Visualizations:

  • Shot dispersion ellipses — variance + skewness of where shots actually land vs intended target, by club + skill level
  • Strokes-gained breakdown — Mark Broadie's 4-bucket SG framework (off-the-tee, approach, around-the-green, putting) with live tooltips on every concept
  • Expected-value calculator — pin-vs-fat-of-green decision under uncertainty: integrate dispersion ellipse over the green's penalty surface, get the EV in shots per round
  • Risk-reward curve — for every dispersion radius (your hands' precision), what's the optimal target line? At what handicap does the optimum flip?

2. PGA Tour (descriptive analytics)

Multi-season aggregation of PGA Tour course data (data/golfdata/pga_*.json):

  • Course timeline — every PGA event 2018–present with course rating, scoring average, cut line, winner score
  • K-means course clusters — 6-cluster solution over course features (length, par-3 difficulty, putting surface speed, water/sand hazard ratios). Each cluster has a "vibe" (links style, parkland, desert, classic, modern, manufactured)
  • Vol cone — per-cluster shot dispersion as a function of distance, computed from millions of ShotLink rows. Reveals the "scoring distance" sweet spot (~110-140 yds for tour pros)
  • Walk-forward backtest — train a finish-position model on 2018-2022, test on 2023, retrain through 2023, test on 2024, etc. Reports per-window MAE / R² so you can see the model degrading or improving over time
  • K-means diagnostics — silhouette scores, within-cluster sum of squares, elbow analysis to justify k=6

3. Predictions (ML models)

Two production-grade models trained offline (scripts/prep-golf-data.mjs builds the training data):

Model Architecture Output
Cut model Logistic regression with engineered features (course-cluster fixed effects, recent form decay, head-to-head SG vs field) P(makes cut) ∈ [0, 1]
Finish model Multinomial Naive Bayes over discretized SG buckets, calibrated against historical finish distribution P(finish in top 5) / top 10 / top 25 / cuts

Both models are exported as plain JSON (weights + scaling parameters) at data/golfdata/cut_model.json and data/golfdata/finish_model.json. The frontend loads them at boot and runs predictions client-side — no inference server.

Live what-if — slide a player's input SG values, watch the predicted cut probability + finish distribution update in real time.

4. 3D (Three.js course scenes)

A 1300-line WebGL view rendering:

  • Course flythrough — programmatically generated terrain (heightmap noise + texture splatting) styled as a Pebble-Beach-esque oceanside par-4. Free-camera mode (orbit / pan / zoom) and a guided "tour the hole" mode that camera-paths from tee to green
  • Ball-flight simulation — physics integrator over launch conditions (clubhead speed, attack angle, spin, wind). Renders the shot trajectory as a parametric curve in 3D space. You can move sliders and watch the trajectory bend in real time
  • Club selection scene — overhead view of the hole with dispersion ellipses for every club, scaled to the player's skill profile. Pick a club → see if your ellipse covers the green or bleeds into hazards

Built with @react-three/fiber + @react-three/drei + tween.js. Performant on integrated GPUs (60 fps on M1 Air, 30+ fps on a 5-year-old MacBook Pro).

5. Career Simulator

A Monte Carlo simulator for the Korn Ferry → PGA Tour promotion path:

  • Input a starting skill profile: SG-OTT, SG-APP, SG-ARG, SG-PUTT (relative to PGA average)
  • 5,000 simulated seasons, each season rolls 25 events with random course assignments
  • For each event: sample dispersion from cluster-specific vol cones, predict finish via the trained model, accumulate FedExCup / Korn Ferry points
  • Aggregate: P(make Korn Ferry top 25) → P(promote to PGA) → P(stay on PGA) → 5-year career trajectory tree
  • Visualization: stacked area chart of "what fraction of the simulated 5,000 seasons are at career stage X by year Y"

Data backbone

The data layer is built offline by scripts/prep-golf-data.mjs (1000-line pipeline). It:

  1. Pulls historical PGA event/round/shot data from public sources
  2. Joins ShotLink-style metrics with course metadata
  3. Runs k-means clustering on course features
  4. Computes vol cones (dispersion as a function of distance, per cluster)
  5. Trains cut + finish models with regularized cross-validation
  6. Walks forward through the holdout windows to verify out-of-sample performance
  7. Exports everything to small JSON files in data/golfdata/

The frontend is a pure consumer — it reads the JSON, runs predictions client-side, renders the visualizations. No inference server, no API calls, no rate limits. The whole site is fully static after the initial JSON load.

JSON files (data/golfdata/)

File Size Content
pga_tour.json ~30 KB High-level event metadata 2018-present
pga_courses_deep.json ~80 KB Per-course feature vectors (length, par-3 difficulty, hazard ratios)
pga_courses_timeline.json ~40 KB Per-course year-over-year scoring trends
pga_kmeans_diagnostics.json ~5 KB Silhouette / WCSS / elbow data justifying k=6
pga_cluster_timeline.json ~20 KB Cluster popularity over time
pga_vol_cone.json ~15 KB Distance-binned dispersion stats per cluster
pga_walkforward.json ~8 KB Per-window MAE/R² for the rolling backtest
pga_strategy_panel.json ~25 KB Cross-tab: skill profile × course type → recommended strategy
pga_majors.json ~10 KB Majors-specific subset for special handling
pga_player_course.json ~120 KB Player×course historical performance for the live what-if scoring
pga_career_paths.json ~50 KB Korn Ferry → PGA transition probabilities (the Monte Carlo input)
pga_analysis.json ~20 KB Aggregate stats for the Learning tab
cut_model.json ~8 KB Logistic regression weights + scaling
finish_model.json ~12 KB Multinomial NB parameters
model.json ~6 KB Catch-all model registry
play_surface.json ~3 KB Per-course green type / Stimpmeter speed
eda.json ~15 KB Exploratory data analysis results
pca.json ~10 KB PCA loadings for course-feature reduction
scatter3d.json ~50 KB Pre-computed 3D point cloud for the cluster visualization

Total: ~530 KB JSON, all loaded once at boot. No external API calls.


Local development

npm install
npm run dev

Open http://localhost:3000. The lab renders fullscreen.

Script What it does
npm run dev Next.js dev server (port 3000)
npm run build Production build
npm run start Run the production build locally
npm run typecheck tsc --noEmit (0 errors expected)
node scripts/prep-golf-data.mjs Re-build the JSON data pack from raw sources (offline)

No environment variables required

Everything is static after the JSON load. No API keys, no third-party SaaS, nothing to configure.


Deployment

Netlify (recommended — netlify.toml is included)

netlify.toml pins Node 20 LTS and adds long-cache headers on /_next/static/* (the JSON data pack is content-hashed by Next, so it's safe to cache aggressively).

npx netlify-cli login
npx netlify-cli init       # link this folder to a new or existing site
npx netlify-cli deploy --prod

Or via the Netlify dashboard:

  1. Add new site → Import from Git → choose this repo
  2. Build settings auto-detect from netlify.toml (build cmd npm run build, publish .next)
  3. No environment variables needed — the lab is fully static after the JSON pack loads
  4. Deploy. Subsequent pushes to main auto-deploy.

Vercel

npx vercel --prod

No vercel.json needed — Next.js conventions are auto-detected.

Other Node hosts

Standard Next.js — Railway, Render, Fly all work without changes.


Bundle size

  • First Load JS: 659 KB (gzipped)
  • ~400 KB of that is Three.js + r3f + drei (the 3D tab)
  • ~150 KB of that is the React + Next.js runtime
  • The rest is component code + the embedded JSON data

The 3D tab is lazy-loaded — initial page load doesn't pay the Three.js cost until the user clicks the 3D sub-tab.


Project structure

golf-data-lab/
├── app/
│   ├── layout.tsx         # root layout (no Win98 chrome)
│   ├── page.tsx           # mounts <GolfDataLab /> fullscreen
│   └── globals.css        # tailwind + base styles
│
├── components/apps/golfdatalab/
│   ├── GolfDataLab.tsx        # 5-tab container + tab bar
│   ├── LearningTab.tsx        # interactive math & SG framework
│   ├── PgaTourTab.tsx         # course timeline + clusters + walk-forward
│   ├── PredictionsTab.tsx     # cut + finish model live scoring
│   ├── ThreeDTab.tsx          # Three.js course scenes (1300 lines)
│   ├── CareerSimulator.tsx    # Korn Ferry → PGA Monte Carlo
│   ├── StrategyLab.tsx        # course-cluster × skill-profile strategy panel
│   └── SgGlossaryModal.tsx    # Mark Broadie's SG framework explainer
│
├── data/golfdata/             # 19 JSON files, ~530 KB total
│   ├── cut_model.json         # logistic regression weights
│   ├── finish_model.json      # multinomial NB parameters
│   ├── pga_tour.json          # event metadata
│   ├── pga_courses_deep.json  # course feature vectors
│   ├── pga_kmeans_diagnostics.json
│   ├── pga_vol_cone.json      # dispersion-by-distance
│   ├── pga_walkforward.json   # per-window backtest results
│   ├── pga_career_paths.json  # Monte Carlo transition probabilities
│   └── ... (13 more)
│
├── scripts/
│   └── prep-golf-data.mjs     # 1000-line offline data pipeline
│
└── lib/wm/types.ts            # minimal WindowState stub

Methodology notes

Strokes Gained framework

All the analytics use Mark Broadie's Strokes Gained framework (Stanford, "Every Shot Counts"). The 4-bucket attribution:

  • SG-OTT (off-the-tee) — drives + tee shots on par-4s and par-5s
  • SG-APP (approach) — every shot from the fairway/rough that's not from <30 yds
  • SG-ARG (around-the-green) — chips, pitches, bunker shots inside 30 yds
  • SG-PUTT — every putt

The benchmark is the field's expected score from each starting point — improvement over benchmark = strokes gained vs the field. This is the same framework the PGA Tour publishes officially since 2014.

Course clustering

K-means on 6 features (post-PCA): course length normalized to par, par-3 difficulty index, water hazard ratio, sand hazard ratio, green speed (Stimpmeter), elevation change. K chosen at 6 via elbow + silhouette analysis (in pga_kmeans_diagnostics.json).

Walk-forward backtest

Standard time-series CV. Train window 2018–2022, test 2023. Then re-fit including 2023, test 2024. Reports per-window R² + MAE so you can verify the model isn't overfitting to a single regime.

Monte Carlo career simulator

5,000 trials × 25 events × 5 years = 625K simulated tournaments. Per-event randomness:

  1. Sample course assignment from the actual season schedule
  2. Sample dispersion from the cluster-specific vol cone (heteroscedastic)
  3. Predict finish via the trained finish model, conditional on the player's SG profile
  4. Accumulate Korn Ferry / FedExCup points per the official scoring rules
  5. End-of-year promotion / demotion thresholds applied

Output: stacked-area chart of "what fraction of trials are at career-stage X by year Y" (Korn Ferry full status / partial status / promoted to PGA / demoted / off-Tour).


License

MIT. Use this code freely.

The PGA Tour event data in data/golfdata/*.json was assembled from public sources for educational purposes. Replace with your own pipeline output for production use.


Credits

Designed + engineered by Will Zhang as part of the WillOS 98 Portfolio.

Methodology: Mark Broadie's Every Shot Counts (Strokes Gained framework). Course geometry / WebGL: Three.js + react-three-fiber + drei.

Built with Next.js 15, React 19, TypeScript 5, Tailwind CSS 4, Three.js r170.

About

PGA Tour analytics terminal — Three.js course flythroughs + ball-flight physics, k-means course clustering, vol cones, walk-forward backtest, cut + finish ML models, Korn Ferry → PGA Monte Carlo. Mark Broadie's Strokes Gained framework. Next.js 15 / React 19 / Three.js r170.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors