Companion code for The Spectral Structure of Natural Data and Learner Geometry Meets Data Geometry.
Interactive visualization of the spectral relationships between three fundamental operators on data: the Gram matrix (kernel), the graph Laplacian (manifold geometry), and the empirical Neural Tangent Kernel (what a network "sees"). Under idealized conditions these share the same eigenfunctions - Laplace-Beltrami eigenfunctions on the data manifold - differing only in eigenvalue transform.
Visualization notebooks. Unlike the experiment repositories in this series, this project is primarily an interactive exploration tool - it computes operators on MNIST subsets in real time and visualizes their spectral properties. Useful for building intuition about the connections discussed in the blog posts. Developed during MATS 9.0.
uv sync # Phase 1-2 (diffusion geometry + wavelets)
uv sync --extra gpu # Phase 3 (eNTK, requires PyTorch)
# Phase 1-2: Diffusion eigenmaps + graph wavelets
marimo edit app.py
# Phase 3: Gram vs Laplacian vs eNTK three-way comparison
marimo edit app_entk.pyPhase 1-2 (app.py): Gram matrix and three Laplacian variants (unnormalized, random-walk, symmetric) on MNIST subsets. Verifies the eigenvalue relationship
Phase 3 (app_entk.py): Computes the empirical NTK of a small MLP at initialization and after training, compares its spectrum to the data Gram matrix and Laplacian. Tests whether training aligns the NTK with data geometry (the "lazy-to-rich" transition from Kumar et al. 2024).
app.py # Phase 1-2: diffusion eigenmaps + wavelets
app_entk.py # Phase 3: Gram vs Laplacian vs eNTK comparison
src/
├── data.py # MNIST stratified subsampling
├── diffusion.py # Gram matrix, Laplacian variants, diffusion maps
├── wavelets.py # Graph wavelets via PyGSP heat kernels
├── entk.py # eNTK via torch.func, MLP training
└── viz.py # Spectral visualization helpers
tests/
├── test_data.py
├── test_diffusion.py
├── test_wavelets.py
└── test_entk.py
- Coifman & Lafon, "Diffusion Maps," ACHA (2006)
- Smola & Kondor, "Kernels and Regularization on Graphs," COLT (2003)
- Bordelon, Canatar & Pehlevan, "Spectrum Dependent Learning Curves," ICML (2020)
- Hammond, Vandergheynst & Gribonval, "Wavelets on Graphs," ACHA (2011)
- Kumar et al., "Grokking as Lazy to Rich Transition," ICLR (2024)
MIT
Developed during MATS 9.0. Written with Claude.