Skip to content

High-performance Python package for balanced k-means clustering using optimal transport and entropic regularization

License

Notifications You must be signed in to change notification settings

kuslavicek/ballot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ballot: Balanced k-means clustering with optimal transport

Python NumPy Pytest PyPI License Version Maintained zread

Ballot (Balanced Lloyd with Optimal Transport) is a high-performance Python package for balanced clustering. It solves the problem of creating equal-sized clusters (or clusters with specific capacity constraints) by leveraging Optimal Transport theory and Entropic Regularization (Sinkhorn algorithm).

Features

  • Speed: Uses Sinkhorn iterations (E-BalLOT) for near-linear time complexity $O(n \log n)$, making it usable for large datasets ($n > 100,000$).
  • Simplicity: precise, math-driven implementation without complex C++ dependencies.
  • Scikit-learn Compatible: Designed to fit seamlessly into existing ML pipelines.

Installation

Install via pip:

pip install ballot

Usage

import numpy as np
from ballot.core import solve_entropic_kantorovich

# Example usage (API subject to change in v0.1)
# Create random data and centroids...
# Run balanced clustering...

Development

To install in editable mode for development:

git clone https://github.com/username/ballot.git
cd ballot
pip install -e .

Run tests:

pytest

References

This project incorporates research from the following paper:

  • BalLOT: Balanced k-means clustering with optimal transport Wenyan Luo, Dustin G. Mixon arXiv:2512.05926

About

High-performance Python package for balanced k-means clustering using optimal transport and entropic regularization

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages