Spectral Geometry on MNIST

Companion code for The Spectral Structure of Natural Data and Learner Geometry Meets Data Geometry.

Interactive visualization of the spectral relationships between three fundamental operators on data: the Gram matrix (kernel), the graph Laplacian (manifold geometry), and the empirical Neural Tangent Kernel (what a network "sees"). Under idealized conditions these share the same eigenfunctions - Laplace-Beltrami eigenfunctions on the data manifold - differing only in eigenvalue transform.

Visualization notebooks. Unlike the experiment repositories in this series, this project is primarily an interactive exploration tool - it computes operators on MNIST subsets in real time and visualizes their spectral properties. Useful for building intuition about the connections discussed in the blog posts. Developed during MATS 9.0.

Running

uv sync              # Phase 1-2 (diffusion geometry + wavelets)
uv sync --extra gpu  # Phase 3 (eNTK, requires PyTorch)

# Phase 1-2: Diffusion eigenmaps + graph wavelets
marimo edit app.py

# Phase 3: Gram vs Laplacian vs eNTK three-way comparison
marimo edit app_entk.py

What it explores

Phase 1-2 (app.py): Gram matrix and three Laplacian variants (unnormalized, random-walk, symmetric) on MNIST subsets. Verifies the eigenvalue relationship $\mu_i = 1 - \lambda_i$ between kernel and random-walk Laplacian. Visualizes eigenvectors as images, diffusion embeddings, and graph wavelets via PyGSP heat kernels.

Phase 3 (app_entk.py): Computes the empirical NTK of a small MLP at initialization and after training, compares its spectrum to the data Gram matrix and Laplacian. Tests whether training aligns the NTK with data geometry (the "lazy-to-rich" transition from Kumar et al. 2024).

Project structure

app.py              # Phase 1-2: diffusion eigenmaps + wavelets
app_entk.py         # Phase 3: Gram vs Laplacian vs eNTK comparison
src/
├── data.py         # MNIST stratified subsampling
├── diffusion.py    # Gram matrix, Laplacian variants, diffusion maps
├── wavelets.py     # Graph wavelets via PyGSP heat kernels
├── entk.py         # eNTK via torch.func, MLP training
└── viz.py          # Spectral visualization helpers
tests/
├── test_data.py
├── test_diffusion.py
├── test_wavelets.py
└── test_entk.py

References

Coifman & Lafon, "Diffusion Maps," ACHA (2006)
Smola & Kondor, "Kernels and Regularization on Graphs," COLT (2003)
Bordelon, Canatar & Pehlevan, "Spectrum Dependent Learning Curves," ICML (2020)
Hammond, Vandergheynst & Gribonval, "Wavelets on Graphs," ACHA (2011)
Kumar et al., "Grokking as Lazy to Rich Transition," ICLR (2024)

License

MIT

Developed during MATS 9.0. Written with Claude.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
app_entk.py		app_entk.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spectral Geometry on MNIST

Running

What it explores

Project structure

References

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Spectral Geometry on MNIST

Running

What it explores

Project structure

References

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages