Interactive exploration of the union of manifolds hypothesis.
Status: Work in progress - functional and producing results, but should be understood as a personal research/exploration tool rather than a polished library.
Loads standard ML datasets (MNIST, Fashion-MNIST, AG News), computes geometric properties of their data manifolds, and visualizes the results interactively in a marimo notebook. The goal is to make the "manifold hypothesis" concrete: you can directly observe that different classes occupy manifolds with distinct intrinsic dimensions, that eigenvalue spectra follow power laws with class-dependent exponents, and that text embeddings cluster by topic in low-dimensional subspaces.
Key analyses:
- Intrinsic dimension estimation via TwoNN (Facco et al. 2017) and MLE (Levina & Bickel 2004)
- Eigenvalue spectrum of the k-NN graph Laplacian, with power-law fits
- Per-class comparison showing that different data manifolds have different geometry
- UMAP projections colored by class, intrinsic dimension, or local density
marimo run app.py --sandboxRequires Python >= 3.10. Dependencies are declared in the marimo script header and resolved automatically with --sandbox.
app.py # Main marimo notebook (all visualization)
data/
loaders.py # MNIST, FMNIST, AG News dataset loaders
embedders.py # TF-IDF, sentence-transformers for text
analysis/
neighbors.py # k-NN graph construction (pynndescent)
intrinsic_dim.py # TwoNN, MLE, participation ratio estimators
spectra.py # Graph Laplacian eigenvalues, power-law fitting
- Facco, E., d'Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7, 12140.
- Levina, E., & Bickel, P. J. (2004). Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 17.
MIT. See LICENSE.
Developed during MATS 9.0. Written with Claude.