Golf Data Lab

A PGA Tour data exploration + prediction terminal. Five tabs:

Learning — interactive math: shot dispersion, strokes-gained breakdowns, expected-value tradeoffs
PGA Tour — course timeline, k-means course clusters, vol cone for shot dispersion, walk-forward predictive backtests
Predictions — cut model + finish-position model trained on multi-season data; live what-if scoring
3D — Three.js course visualization: flythroughs, ball-flight curvature simulations, club-selection scenes (1300 lines of WebGL)
Career Simulator — Path-to-Tour Monte Carlo: Korn Ferry → PGA promotion probabilities given a starting skill profile

Standalone version of the Golf Data Lab app from w1zz7/willos-98-portfolio. Same code, no Win98 desktop chrome — just the lab, fullscreen.

What it does

1. Learning (interactive math)

A teaching interface aimed at curious golfers + analysts who want to understand why a 6-handicap should pick the safe line and a tour pro should pick the aggressive one. Visualizations:

Shot dispersion ellipses — variance + skewness of where shots actually land vs intended target, by club + skill level
Strokes-gained breakdown — Mark Broadie's 4-bucket SG framework (off-the-tee, approach, around-the-green, putting) with live tooltips on every concept
Expected-value calculator — pin-vs-fat-of-green decision under uncertainty: integrate dispersion ellipse over the green's penalty surface, get the EV in shots per round
Risk-reward curve — for every dispersion radius (your hands' precision), what's the optimal target line? At what handicap does the optimum flip?

2. PGA Tour (descriptive analytics)

Multi-season aggregation of PGA Tour course data (data/golfdata/pga_*.json):

Course timeline — every PGA event 2018–present with course rating, scoring average, cut line, winner score
K-means course clusters — 6-cluster solution over course features (length, par-3 difficulty, putting surface speed, water/sand hazard ratios). Each cluster has a "vibe" (links style, parkland, desert, classic, modern, manufactured)
Vol cone — per-cluster shot dispersion as a function of distance, computed from millions of ShotLink rows. Reveals the "scoring distance" sweet spot (~110-140 yds for tour pros)
Walk-forward backtest — train a finish-position model on 2018-2022, test on 2023, retrain through 2023, test on 2024, etc. Reports per-window MAE / R² so you can see the model degrading or improving over time
K-means diagnostics — silhouette scores, within-cluster sum of squares, elbow analysis to justify k=6

3. Predictions (ML models)

Two production-grade models trained offline (scripts/prep-golf-data.mjs builds the training data):

Model	Architecture	Output
Cut model	Logistic regression with engineered features (course-cluster fixed effects, recent form decay, head-to-head SG vs field)	P(makes cut) ∈ [0, 1]
Finish model	Multinomial Naive Bayes over discretized SG buckets, calibrated against historical finish distribution	P(finish in top 5) / top 10 / top 25 / cuts

Both models are exported as plain JSON (weights + scaling parameters) at data/golfdata/cut_model.json and data/golfdata/finish_model.json. The frontend loads them at boot and runs predictions client-side — no inference server.

Live what-if — slide a player's input SG values, watch the predicted cut probability + finish distribution update in real time.

4. 3D (Three.js course scenes)

A 1300-line WebGL view rendering:

Course flythrough — programmatically generated terrain (heightmap noise + texture splatting) styled as a Pebble-Beach-esque oceanside par-4. Free-camera mode (orbit / pan / zoom) and a guided "tour the hole" mode that camera-paths from tee to green
Ball-flight simulation — physics integrator over launch conditions (clubhead speed, attack angle, spin, wind). Renders the shot trajectory as a parametric curve in 3D space. You can move sliders and watch the trajectory bend in real time
Club selection scene — overhead view of the hole with dispersion ellipses for every club, scaled to the player's skill profile. Pick a club → see if your ellipse covers the green or bleeds into hazards

Built with @react-three/fiber + @react-three/drei + tween.js. Performant on integrated GPUs (60 fps on M1 Air, 30+ fps on a 5-year-old MacBook Pro).

5. Career Simulator

A Monte Carlo simulator for the Korn Ferry → PGA Tour promotion path:

Input a starting skill profile: SG-OTT, SG-APP, SG-ARG, SG-PUTT (relative to PGA average)
5,000 simulated seasons, each season rolls 25 events with random course assignments
For each event: sample dispersion from cluster-specific vol cones, predict finish via the trained model, accumulate FedExCup / Korn Ferry points
Aggregate: P(make Korn Ferry top 25) → P(promote to PGA) → P(stay on PGA) → 5-year career trajectory tree
Visualization: stacked area chart of "what fraction of the simulated 5,000 seasons are at career stage X by year Y"

Data backbone

The data layer is built offline by scripts/prep-golf-data.mjs (1000-line pipeline). It:

Pulls historical PGA event/round/shot data from public sources
Joins ShotLink-style metrics with course metadata
Runs k-means clustering on course features
Computes vol cones (dispersion as a function of distance, per cluster)
Trains cut + finish models with regularized cross-validation
Walks forward through the holdout windows to verify out-of-sample performance
Exports everything to small JSON files in data/golfdata/

The frontend is a pure consumer — it reads the JSON, runs predictions client-side, renders the visualizations. No inference server, no API calls, no rate limits. The whole site is fully static after the initial JSON load.

JSON files (`data/golfdata/`)

File	Size	Content
`pga_tour.json`	~30 KB	High-level event metadata 2018-present
`pga_courses_deep.json`	~80 KB	Per-course feature vectors (length, par-3 difficulty, hazard ratios)
`pga_courses_timeline.json`	~40 KB	Per-course year-over-year scoring trends
`pga_kmeans_diagnostics.json`	~5 KB	Silhouette / WCSS / elbow data justifying k=6
`pga_cluster_timeline.json`	~20 KB	Cluster popularity over time
`pga_vol_cone.json`	~15 KB	Distance-binned dispersion stats per cluster
`pga_walkforward.json`	~8 KB	Per-window MAE/R² for the rolling backtest
`pga_strategy_panel.json`	~25 KB	Cross-tab: skill profile × course type → recommended strategy
`pga_majors.json`	~10 KB	Majors-specific subset for special handling
`pga_player_course.json`	~120 KB	Player×course historical performance for the live what-if scoring
`pga_career_paths.json`	~50 KB	Korn Ferry → PGA transition probabilities (the Monte Carlo input)
`pga_analysis.json`	~20 KB	Aggregate stats for the Learning tab
`cut_model.json`	~8 KB	Logistic regression weights + scaling
`finish_model.json`	~12 KB	Multinomial NB parameters
`model.json`	~6 KB	Catch-all model registry
`play_surface.json`	~3 KB	Per-course green type / Stimpmeter speed
`eda.json`	~15 KB	Exploratory data analysis results
`pca.json`	~10 KB	PCA loadings for course-feature reduction
`scatter3d.json`	~50 KB	Pre-computed 3D point cloud for the cluster visualization

Total: ~530 KB JSON, all loaded once at boot. No external API calls.

Local development

npm install
npm run dev

Open http://localhost:3000. The lab renders fullscreen.

Script	What it does
`npm run dev`	Next.js dev server (port 3000)
`npm run build`	Production build
`npm run start`	Run the production build locally
`npm run typecheck`	`tsc --noEmit` (0 errors expected)
`node scripts/prep-golf-data.mjs`	Re-build the JSON data pack from raw sources (offline)

No environment variables required

Everything is static after the JSON load. No API keys, no third-party SaaS, nothing to configure.

Deployment

Netlify (recommended — `netlify.toml` is included)

netlify.toml pins Node 20 LTS and adds long-cache headers on /_next/static/* (the JSON data pack is content-hashed by Next, so it's safe to cache aggressively).

npx netlify-cli login
npx netlify-cli init       # link this folder to a new or existing site
npx netlify-cli deploy --prod

Or via the Netlify dashboard:

Add new site → Import from Git → choose this repo
Build settings auto-detect from netlify.toml (build cmd npm run build, publish .next)
No environment variables needed — the lab is fully static after the JSON pack loads
Deploy. Subsequent pushes to main auto-deploy.

Vercel

npx vercel --prod

No vercel.json needed — Next.js conventions are auto-detected.

Other Node hosts

Standard Next.js — Railway, Render, Fly all work without changes.

Bundle size

First Load JS: 659 KB (gzipped)
~400 KB of that is Three.js + r3f + drei (the 3D tab)
~150 KB of that is the React + Next.js runtime
The rest is component code + the embedded JSON data

The 3D tab is lazy-loaded — initial page load doesn't pay the Three.js cost until the user clicks the 3D sub-tab.

Project structure

golf-data-lab/
├── app/
│   ├── layout.tsx         # root layout (no Win98 chrome)
│   ├── page.tsx           # mounts <GolfDataLab /> fullscreen
│   └── globals.css        # tailwind + base styles
│
├── components/apps/golfdatalab/
│   ├── GolfDataLab.tsx        # 5-tab container + tab bar
│   ├── LearningTab.tsx        # interactive math & SG framework
│   ├── PgaTourTab.tsx         # course timeline + clusters + walk-forward
│   ├── PredictionsTab.tsx     # cut + finish model live scoring
│   ├── ThreeDTab.tsx          # Three.js course scenes (1300 lines)
│   ├── CareerSimulator.tsx    # Korn Ferry → PGA Monte Carlo
│   ├── StrategyLab.tsx        # course-cluster × skill-profile strategy panel
│   └── SgGlossaryModal.tsx    # Mark Broadie's SG framework explainer
│
├── data/golfdata/             # 19 JSON files, ~530 KB total
│   ├── cut_model.json         # logistic regression weights
│   ├── finish_model.json      # multinomial NB parameters
│   ├── pga_tour.json          # event metadata
│   ├── pga_courses_deep.json  # course feature vectors
│   ├── pga_kmeans_diagnostics.json
│   ├── pga_vol_cone.json      # dispersion-by-distance
│   ├── pga_walkforward.json   # per-window backtest results
│   ├── pga_career_paths.json  # Monte Carlo transition probabilities
│   └── ... (13 more)
│
├── scripts/
│   └── prep-golf-data.mjs     # 1000-line offline data pipeline
│
└── lib/wm/types.ts            # minimal WindowState stub

Methodology notes

Strokes Gained framework

All the analytics use Mark Broadie's Strokes Gained framework (Stanford, "Every Shot Counts"). The 4-bucket attribution:

SG-OTT (off-the-tee) — drives + tee shots on par-4s and par-5s
SG-APP (approach) — every shot from the fairway/rough that's not from <30 yds
SG-ARG (around-the-green) — chips, pitches, bunker shots inside 30 yds
SG-PUTT — every putt

The benchmark is the field's expected score from each starting point — improvement over benchmark = strokes gained vs the field. This is the same framework the PGA Tour publishes officially since 2014.

Course clustering

K-means on 6 features (post-PCA): course length normalized to par, par-3 difficulty index, water hazard ratio, sand hazard ratio, green speed (Stimpmeter), elevation change. K chosen at 6 via elbow + silhouette analysis (in pga_kmeans_diagnostics.json).

Walk-forward backtest

Standard time-series CV. Train window 2018–2022, test 2023. Then re-fit including 2023, test 2024. Reports per-window R² + MAE so you can verify the model isn't overfitting to a single regime.

Monte Carlo career simulator

5,000 trials × 25 events × 5 years = 625K simulated tournaments. Per-event randomness:

Sample course assignment from the actual season schedule
Sample dispersion from the cluster-specific vol cone (heteroscedastic)
Predict finish via the trained finish model, conditional on the player's SG profile
Accumulate Korn Ferry / FedExCup points per the official scoring rules
End-of-year promotion / demotion thresholds applied

Output: stacked-area chart of "what fraction of trials are at career-stage X by year Y" (Korn Ferry full status / partial status / promoted to PGA / demoted / off-Tour).

License

MIT. Use this code freely.

The PGA Tour event data in data/golfdata/*.json was assembled from public sources for educational purposes. Replace with your own pipeline output for production use.

Credits

Designed + engineered by Will Zhang as part of the WillOS 98 Portfolio.

Methodology: Mark Broadie's Every Shot Counts (Strokes Gained framework). Course geometry / WebGL: Three.js + react-three-fiber + drei.

Built with Next.js 15, React 19, TypeScript 5, Tailwind CSS 4, Three.js r170.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.claude		.claude
app		app
components/apps/golfdatalab		components/apps/golfdatalab
data/golfdata		data/golfdata
lib/wm		lib/wm
public		public
scripts		scripts
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
netlify.toml		netlify.toml
next.config.ts		next.config.ts
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
prep-golf-data.mjs		prep-golf-data.mjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Golf Data Lab

What it does

1. Learning (interactive math)

2. PGA Tour (descriptive analytics)

3. Predictions (ML models)

4. 3D (Three.js course scenes)

5. Career Simulator

Data backbone

JSON files (`data/golfdata/`)

Local development

No environment variables required

Deployment

Netlify (recommended — `netlify.toml` is included)

Vercel

Other Node hosts

Bundle size

Project structure

Methodology notes

Strokes Gained framework

Course clustering

Walk-forward backtest

Monte Carlo career simulator

License

Credits

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Golf Data Lab

What it does

1. Learning (interactive math)

2. PGA Tour (descriptive analytics)

3. Predictions (ML models)

4. 3D (Three.js course scenes)

5. Career Simulator

Data backbone

JSON files (data/golfdata/)

Local development

No environment variables required

Deployment

Netlify (recommended — netlify.toml is included)

Vercel

Other Node hosts

Bundle size

Project structure

Methodology notes

Strokes Gained framework

Course clustering

Walk-forward backtest

Monte Carlo career simulator

License

Credits

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

JSON files (`data/golfdata/`)

Netlify (recommended — `netlify.toml` is included)

Packages