Skip to content

CharlesR-W/manifold-explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Manifold Explorer

Interactive exploration of the union of manifolds hypothesis.

Status: Work in progress - functional and producing results, but should be understood as a personal research/exploration tool rather than a polished library.

What it does

Loads standard ML datasets (MNIST, Fashion-MNIST, AG News), computes geometric properties of their data manifolds, and visualizes the results interactively in a marimo notebook. The goal is to make the "manifold hypothesis" concrete: you can directly observe that different classes occupy manifolds with distinct intrinsic dimensions, that eigenvalue spectra follow power laws with class-dependent exponents, and that text embeddings cluster by topic in low-dimensional subspaces.

Key analyses:

  • Intrinsic dimension estimation via TwoNN (Facco et al. 2017) and MLE (Levina & Bickel 2004)
  • Eigenvalue spectrum of the k-NN graph Laplacian, with power-law fits
  • Per-class comparison showing that different data manifolds have different geometry
  • UMAP projections colored by class, intrinsic dimension, or local density

Running

marimo run app.py --sandbox

Requires Python >= 3.10. Dependencies are declared in the marimo script header and resolved automatically with --sandbox.

Project structure

app.py                      # Main marimo notebook (all visualization)
data/
    loaders.py              # MNIST, FMNIST, AG News dataset loaders
    embedders.py            # TF-IDF, sentence-transformers for text
analysis/
    neighbors.py            # k-NN graph construction (pynndescent)
    intrinsic_dim.py        # TwoNN, MLE, participation ratio estimators
    spectra.py              # Graph Laplacian eigenvalues, power-law fitting

References

  • Facco, E., d'Errico, M., Rodriguez, A., & Laio, A. (2017). Estimating the intrinsic dimension of datasets by a minimal neighborhood information. Scientific Reports, 7, 12140.
  • Levina, E., & Bickel, P. J. (2004). Maximum likelihood estimation of intrinsic dimension. Advances in Neural Information Processing Systems, 17.

License

MIT. See LICENSE.


Developed during MATS 9.0. Written with Claude.

About

Interactive exploration of the union of manifolds hypothesis

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages