StreamMat is a Python library designed for high-performance, out-of-core sparse matrix operations. It uses TileDB as its storage backend to efficiently handle matrices that are too large to fit into memory. The library provides a FastAPI-based web server to expose matrix operations as a RESTful API.
- Out-of-Core Computation: Process massive sparse matrices that exceed available RAM.
- High Performance: Asynchronous, chunked operations to maximize throughput.
- TileDB Backend: Leverages the power of the TileDB array database for efficient sparse data storage and retrieval.
- RESTful API: Easy-to-use API for loading matrices, performing vector multiplications, and more.
- On-the-Fly Data Conversion: Automatically converts matrices from popular formats (like Matrix Market) and downloads from URLs.
Once published, you can install StreamMat directly from PyPI:
pip install streammat-
Clone the repository:
git clone https://github.com/Versalife/streamat.git cd streamat -
Build and install the package:
uv build pip install dist/streammat-*.whl
StreamMat can automatically convert matrices on the fly. However, you can still pre-convert a standard Matrix Market (.mtx) file to the TileDB format using the streammat-convert CLI tool.
streammat-convert <input_mm_file.mtx> <output_tiledb_uri> [--dtype <data_type>] [--overwrite]streammat-serverThe server will be available at http://127.0.0.1:8000.
The server provides interactive API documentation (via Swagger UI) at http://127.0.0.1:8000/docs.
Here are some curl examples for interacting with the API:
This endpoint loads a matrix into the server's memory. It can be a path to a local TileDB array, a local file that needs conversion (e.g., .mtx), or a URL to a file.
The endpoint returns a stream of server-sent events to track progress.
# Load from a local TileDB array (or a file that needs conversion)
curl -N -X PUT "http://127.0.0.1:8000/api/v1/matrix/my_matrix" \
-H "Content-Type: application/json" \
-d '{
"uri": "my_matrix_tiledb"
}'
# Example Output Stream:
# data: {"status": "converting", "progress": 50.00}
# data: {"status": "converting", "progress": 100.00}
# data: {"status": "loading_complete", "uri": "/path/to/cache/....tiledb"}Check the server status, see which matrices are loaded, and discover the expected shapes for operations.
curl -X GET "http://127.0.0.1:8000/api/v1/status"The response will include an operations field for each matrix, detailing the input_shape and output_shape for matvec and rmatvec.
Calculates y = A * x. You can provide the vector as a dense list or a sparse object.
# Dense vector
curl -X POST "http://127.0.0.1:8000/api/v1/matvec/my_matrix" \
-H "Content-Type: application/json" \
-d '{
"vector": [1, 2, 3, 4]
}'
# Sparse vector (equivalent to [1, 0, 0, 4])
curl -X POST "http://127.0.0.1:8000/api/v1/matvec/my_matrix" \
-H "Content-Type: application/json" \
-d '{
"vector": {
"indices": [0, 3],
"values": [1.0, 4.0]
}
}'Remove a matrix from the server and delete its data from the disk cache.
curl -X DELETE "http://127.0.0.1:8000/api/v1/matrix/my_matrix"-
Clone the repository to run the example:
git clone https://github.com/Versalife/streamat.git cd streamat -
Install the development dependencies:
uv pip install --system '.[dev]' -
Run the example script:
python streamat/example.py