StaMojo

A statistical computing library for Mojo, inspired by scipy.stats and statsmodels in Python.

Overview

StaMojo (Statistics + Mojo) brings comprehensive statistical computing to the Mojo ecosystem. The library is organized into two parts:

Part	Scope	Dependencies
Part I — Statistical Computing Foundation	Special functions, distributions, descriptive statistics, hypothesis tests, correlation	Mojo stdlib only
Part II — Statistical Modeling	OLS, GLM, logistic regression, and model diagnostics	NuMojo + MatMojo

Part I is available now with zero external dependencies. Part II will begin once the upstream linear-algebra ecosystem stabilizes on a compatible Mojo release.

Why a separate library?

In the Python ecosystem, scipy bundles statistics, optimization, signal processing, integration, interpolation, and more into one giant package. For Mojo, a modular approach is more appropriate:

Python package	Mojo equivalent	Focus
`numpy`	NuMojo	N-dimensional arrays, basic math
`decimal` / `mpmath`	DeciMojo	Arbitrary-precision arithmetic
`numpy.linalg` / `scipy.linalg`	MatMojo	Linear algebra
`scipy.stats`	StaMojo (distributions + tests)	Statistical distributions & tests
`statsmodels`	StaMojo (models)	Statistical models & econometrics

Placing scipy.stats-like functionality and statsmodels-like regression in one library is intentional: regression models inherently depend on distribution functions (for p-values, confidence intervals, etc.), so co-locating them avoids circular dependencies, simplifies versioning, and provides a cohesive API.

Status

v0.1 — Part I is complete and ready for use. The current release provides:

Category	Functions
Special functions	`gammainc`, `gammaincc`, `beta`, `lbeta`, `betainc`, `erfinv`, `ndtri`
Distributions	`Normal`, `StudentT`, `ChiSquared`, `FDist` — each with PDF, log-PDF, CDF, SF, PPF, `rvs`
Descriptive stats	`mean`, `variance`, `std`, `median`, `quantile`, `skewness`, `kurtosis`, `data_min`, `data_max`
Correlation	`pearsonr`, `spearmanr`, `kendalltau` (with p-values)
Hypothesis tests	`ttest_1samp`, `ttest_ind`, `ttest_rel`, `chi2_gof`, `chi2_ind`, `ks_1samp`, `f_oneway`

All 30 functions are self-contained (Mojo stdlib only) and covered by unit tests validated against SciPy reference values.

What about Part II (statistical models)? OLS regression and GLMs require matrix operations that depend on NuMojo and MatMojo. These upstream libraries are still fast evolving. Part II will resume once the ecosystem catches up. See the full roadmap for details.

Background

Due to my academic and professional background, I work extensively with hypothesis testing and regression models on a daily basis, and have been a long-time user of Stata and statsmodels. It has been two years since Mojo first appeared, and NuMojo now has its core functionality in place. Driven by my enthusiasm for Mojo, I felt it was time to start migrating some of my personal research projects to the Mojo ecosystem — and that is precisely how StaMojo was born.

The library is designed around two pillars:

Part I — Statistical computing foundation (self-contained) — special functions, probability distributions, descriptive statistics, hypothesis tests, and correlation.
Part II — Statistical modeling (depends on NuMojo and MatMojo) — OLS, GLM, logistic regression, and related diagnostics.

At the moment I am still building out the project scaffolding and solidifying the core functionality. Because Mojo has not yet reached v1.0, breaking changes are frequent across compiler releases, so pull requests are not preferred at this time. If you have any suggestions, questions, or feedback, please feel free to open an issue, start a discussion, or reach out on our Discord channel. Thank you for your understanding!

Installation

StaMojo is available in the modular-community https://repo.prefix.dev/modular-community package repository. To access this repository, add it to your channels list in your pixi.toml file:

channels = ["https://conda.modular.com/max", "https://repo.prefix.dev/modular-community", "conda-forge"]

Then, you can install StaMojo using any of these methods:

From the pixi CLI, run the command pixi add stamojo. This fetches the latest version and makes it immediately available for import.
In the mojoproject.toml file of your project, add the following dependency:
```
stamojo = "==0.1.0"
```
Then run pixi install to download and install the package.
For the latest development version in the main branch, clone this GitHub repository and build the package locally using the command pixi run package.
```
git clone https://github.com/mojomath/stamojo.git
cd stamojo
pixi install
pixi run package
```

The following table summarizes the package versions and their corresponding Mojo versions:

`stamojo`	`mojo`	package manager
v0.1.0	==0.26.1	pixi

Examples

The file examples/examples.mojo demonstrates the key APIs available in Part I. Run it with:

mojo run -I src examples/examples.mojo

from stamojo.special import gammainc, gammaincc, beta, lbeta, betainc, erfinv, ndtri
from stamojo.distributions import Normal, StudentT, ChiSquared, FDist
from stamojo.stats import (
    mean, variance, std, median, quantile, skewness, kurtosis,
    pearsonr, spearmanr, kendalltau,
    ttest_1samp, ttest_ind, ttest_rel,
    chi2_gof, chi2_ind, ks_1samp, f_oneway,
)


fn main() raises:
    # --- Special functions ---------------------------------------------------
    print("gammainc(1, 2) =", gammainc(1.0, 2.0))           # 0.8646647167628346
    print("gammaincc(1, 2) =", gammaincc(1.0, 2.0))         # 0.13533528323716537
    print("beta(2, 3)     =", beta(2.0, 3.0))               # 0.08333333333323925
    print("betainc(2, 3, 0.5) =", betainc(2.0, 3.0, 0.5))   # 0.6875000000000885
    print("erfinv(0.5)    =", erfinv(0.5))                  # 0.4769362762044701
    print("ndtri(0.975)   =", ndtri(0.975))                 # 1.9599639845400543

    # --- Distributions -------------------------------------------------------
    var n = Normal(0.0, 1.0)
    print("Normal(0,1).pdf(0)   =", n.pdf(0.0))             # 0.3989422804014327
    print("Normal(0,1).cdf(1.96)=", n.cdf(1.96))            # 0.9750021048517795
    print("Normal(0,1).ppf(0.975)=", n.ppf(0.975))          # 1.9599639845400543
    print("Normal(0,1).sf(1.96) =", n.sf(1.96))             # 0.02499789514822043

    var t = StudentT(10.0)
    print("StudentT(10).cdf(2.0)=", t.cdf(2.0))             # 0.9633059826444078
    print("StudentT(10).ppf(0.975)=", t.ppf(0.975))         # 2.2281388540534057

    var c = ChiSquared(5.0)
    print("ChiSquared(5).cdf(11.07)=", c.cdf(11.07))        # 0.9499903814759155

    var f = FDist(5.0, 10.0)
    print("FDist(5,10).cdf(3.33)=", f.cdf(3.33))            # 0.9501687242532277

    # --- Descriptive statistics ----------------------------------------------
    var data: List[Float64] = [2.0, 4.0, 4.0, 4.0, 5.0, 5.0, 7.0, 9.0]
    print("mean    =", mean(data))              # 5.0
    print("variance=", variance(data, ddof=0))  # 4.0
    print("std     =", std(data, ddof=0))       # 2.0
    print("median  =", median(data))            # 4.5
    print("Q(0.25) =", quantile(data, 0.25))    # 4.0
    print("skewness=", skewness(data))          # 0.8184875533567997
    print("kurtosis=", kurtosis(data))          # 0.940625

    # --- Correlation ---------------------------------------------------------
    var x: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0]
    var y: List[Float64] = [2.1, 3.8, 6.0, 7.9, 10.1]
    var pr = pearsonr(x, y)
    print("pearsonr  r=", pr[0], " p=", pr[1])    # 0.9991718425080479, 2.8605484175113625e-05
    var sr = spearmanr(x, y)
    print("spearmanr ρ=", sr[0], " p=", sr[1])    # 1.0, 0.0
    var kt = kendalltau(x, y)
    print("kendalltau τ=", kt[0], " p=", kt[1])   # 1.0, 0.014305878435429659

    # --- Hypothesis tests ----------------------------------------------------
    var sample: List[Float64] = [5.1, 4.8, 5.3, 5.0, 4.9, 5.2]
    var res = ttest_1samp(sample, 5.0)
    print("ttest_1samp  t=", res[0], " p=", res[1])   # 0.654653670707975, 0.5416045608507769

    var a: List[Float64] = [1.0, 2.0, 3.0, 4.0, 5.0]
    var b: List[Float64] = [4.0, 5.0, 6.0, 7.0, 8.0]
    var res2 = ttest_ind(a, b)
    print("ttest_ind    t=", res2[0], " p=", res2[1])  # -3.0, 0.0170716812337895

    var obs: List[Float64] = [16.0, 18.0, 16.0, 14.0, 12.0, 14.0]
    var exp: List[Float64] = [15.0, 15.0, 15.0, 15.0, 15.0, 15.0]
    var res3 = chi2_gof(obs, exp)
    print("chi2_gof   χ²=", res3[0], " p=", res3[1])  # 1.4666666666666666, 0.9168841203537823

    var g1: List[Float64] = [3.0, 4.0, 5.0]
    var g2: List[Float64] = [6.0, 7.0, 8.0]
    var g3: List[Float64] = [9.0, 10.0, 11.0]
    var groups = List[List[Float64]]()
    groups.append(g1^)
    groups.append(g2^)
    groups.append(g3^)
    var res4 = f_oneway(groups)
    print("f_oneway   F=", res4[0], " p=", res4[1])   # 27.0, 0.0010000000005315757

Architecture

src/stamojo/
├── __init__.mojo              # Package root (re-exports distributions & stats)
├── prelude.mojo               # Convenient re-exports
├── special/                   # Special mathematical functions (cf. scipy.special)
│   ├── __init__.mojo
│   ├── _gamma.mojo            # gammainc, gammaincc
│   ├── _beta.mojo             # beta, lbeta, betainc
│   └── _erf.mojo              # erfinv, ndtri
├── distributions/             # Probability distributions
│   ├── __init__.mojo
│   ├── normal.mojo            # Normal (Gaussian) — PDF, logPDF, CDF, SF, PPF, rvs
│   ├── t.mojo                 # Student's t
│   ├── chi2.mojo              # Chi-squared
│   └── f.mojo                 # F-distribution
├── stats/                     # Descriptive stats & hypothesis tests
│   ├── __init__.mojo
│   ├── descriptive.mojo       # mean, variance, std, median, quantile, skewness, kurtosis
│   ├── correlation.mojo       # pearsonr, spearmanr, kendalltau
│   └── tests.mojo             # ttest_1samp, ttest_ind, ttest_rel, chi2_gof, chi2_ind, ks_1samp, f_oneway
└── models/                    # Statistical models (planned)
    ├── __init__.mojo
    └── ols.mojo               # Ordinary Least Squares (stub)
tests/
├── test_all.sh                # Run all test suites
├── test_special.mojo          # tests — special functions
├── test_distributions.mojo    # tests — Normal, t, χ², F
├── test_stats.mojo            # tests — descriptive statistics
└── test_hypothesis.mojo       # tests — hypothesis tests, correlation, ANOVA

Roadmap

The project is organized into Part I (scipy.stats-equivalent, no external dependencies) and Part II (statsmodels-equivalent, requires NuMojo + MatMojo). Phases 0–2 are complete; see the full roadmap for all planned phases.

	Phase	Status
Part I	Phase 0 — Special Functions	✓
	Phase 1 — Core Distributions & Descriptive Stats	✓
	Phase 2 — Hypothesis Testing & Correlation	✓
	Phase 3 — Extended Distributions	planned
	Phase 4 — Extended Tests & Utilities	planned
Part II	Phase 5 — OLS Regression	awaiting NuMojo / MatMojo
	Phase 6 — Generalized Linear Models	awaiting NuMojo / MatMojo
	Phase 7 — Extended Models	planned
	Phase 8 — Advanced Topics	planned

License

This repository and its contributions are licensed under the Apache License v2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
src/stamojo		src/stamojo
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StaMojo

Overview

Why a separate library?

Status

Background

Installation

Examples

Architecture

Roadmap

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StaMojo

Overview

Why a separate library?

Status

Background

Installation

Examples

Architecture

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages