Skip to content

CharlesR-W/spectral-geometry

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Spectral Geometry on MNIST

Companion code for The Spectral Structure of Natural Data and Learner Geometry Meets Data Geometry.

Interactive visualization of the spectral relationships between three fundamental operators on data: the Gram matrix (kernel), the graph Laplacian (manifold geometry), and the empirical Neural Tangent Kernel (what a network "sees"). Under idealized conditions these share the same eigenfunctions - Laplace-Beltrami eigenfunctions on the data manifold - differing only in eigenvalue transform.

Visualization notebooks. Unlike the experiment repositories in this series, this project is primarily an interactive exploration tool - it computes operators on MNIST subsets in real time and visualizes their spectral properties. Useful for building intuition about the connections discussed in the blog posts. Developed during MATS 9.0.

Running

uv sync              # Phase 1-2 (diffusion geometry + wavelets)
uv sync --extra gpu  # Phase 3 (eNTK, requires PyTorch)

# Phase 1-2: Diffusion eigenmaps + graph wavelets
marimo edit app.py

# Phase 3: Gram vs Laplacian vs eNTK three-way comparison
marimo edit app_entk.py

What it explores

Phase 1-2 (app.py): Gram matrix and three Laplacian variants (unnormalized, random-walk, symmetric) on MNIST subsets. Verifies the eigenvalue relationship $\mu_i = 1 - \lambda_i$ between kernel and random-walk Laplacian. Visualizes eigenvectors as images, diffusion embeddings, and graph wavelets via PyGSP heat kernels.

Phase 3 (app_entk.py): Computes the empirical NTK of a small MLP at initialization and after training, compares its spectrum to the data Gram matrix and Laplacian. Tests whether training aligns the NTK with data geometry (the "lazy-to-rich" transition from Kumar et al. 2024).

Project structure

app.py              # Phase 1-2: diffusion eigenmaps + wavelets
app_entk.py         # Phase 3: Gram vs Laplacian vs eNTK comparison
src/
├── data.py         # MNIST stratified subsampling
├── diffusion.py    # Gram matrix, Laplacian variants, diffusion maps
├── wavelets.py     # Graph wavelets via PyGSP heat kernels
├── entk.py         # eNTK via torch.func, MLP training
└── viz.py          # Spectral visualization helpers
tests/
├── test_data.py
├── test_diffusion.py
├── test_wavelets.py
└── test_entk.py

References

  • Coifman & Lafon, "Diffusion Maps," ACHA (2006)
  • Smola & Kondor, "Kernels and Regularization on Graphs," COLT (2003)
  • Bordelon, Canatar & Pehlevan, "Spectrum Dependent Learning Curves," ICML (2020)
  • Hammond, Vandergheynst & Gribonval, "Wavelets on Graphs," ACHA (2011)
  • Kumar et al., "Grokking as Lazy to Rich Transition," ICLR (2024)

License

MIT


Developed during MATS 9.0. Written with Claude.

About

Interactive spectral geometry visualization on MNIST - companion code for The Spectral Structure of Natural Data

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages