this software is still in an experimental/unstable state, and further testing is required before it can be recommended for general use. notably, serious bugs might still be present (use has so far been entirely internal) and no versioning policy is followed.
regular changes to the repo are being made without any corresponding modifications to the version string in pyproject.toml (i.e. releases are not cut and versioning is entirely based off of git commits).
it is expected that once umat reaches an adequate/stable state, releases will start being generated (either via PyPI or github releases), and SemVer compliance will be enforced.
if you are interested in using umat and/or encounter any issues, please feel free to file issues on github and/or reach out through other channels.
umat provides a set of tools which allow for initial processing and quantification of outputs generated by MERSCOPE imaging following mosaic file generation and transcript detection.
all tools are exposed as subcommands to the provided umat program entrypoint, and each provides a -h flag describing necessary arguments.
using uv, umat can be installed as follows:
uv tool install git+ssh://git@github.com/namemcguffin/umat
umat -ha Dockerfile is also provided, which can be used to generate a OCI image and subsequently a SIF file for use on HPC clusters via the following commands:
docker build --platform=linux/amd64 -t umat:latest . # build docker image
docker save umat:latest > umat.tar # save docker image to disk as archive
apptainer build umat.sif docker-archive://umat.tar # convert docker archive to sifthe image generated will have the cpsam cellpose model pre-downloaded.
if you plan on using other cellpose models in an HPC environment where compute nodes do not have an internet connection, you should change the Dockerfile accordingly to ensure that the models are cached in the image.
umat segd provides a way to segment MERSCOPE-generated mosaic files using cellpose, specifically utilizing the distributed_eval function provided in cellpose.contrib to split work into chunks for distribution across multiple workers.
the output label array can be saved to disk in either the npy (ingestible via numpy.load) or zarr (ingestible via zarr.open) formats.
it is recommended to run umat segd on HPC infrastructure as it is extremely compute and memory intensive.
only linux x86-64 environments are supported for segmentation, and the presence of a CUDA-compatible GPU is assumed.
umat segd is able to utilize multiple GPUs simultaneously, with the recommended allocation being N+1 CPUs allocated with N GPUs.
the chunk side length (lx, ly, lz) parameters should be optimized for depending on node specifics to maximally utilize available memory.
on a 400GB RAM, 5 CPU, 4 A100 node allocation, chunk dimension was set to (4096, 4096, 7) to avoid OOM-death while using close to maximal available resources.
umat preview provides a way to generate a preview of the segmentations generated by umat segd.
umat boundary generates cell boundary polygons using the masks generated by umat segd, saving it as a geopandas-generated feather file.
these can be read in using geopandas.read_feather in python and sfarrow::st_read_feather in R.
umat assign generates a cell by gene matrix using the cell boundary polygons generated by umat boundary, saving it as an anndata h5ad file.
for very large datasets, umat assign might use a lot of RAM, as such running it on HPC resources might be advisable.
umat signals computes per-cell properties from mosaic images (e.g. average intensity, area, etc.).
this can be useful for determining signal of DAPI/PolyT for each cell, or for getting metrics for "side channel" probes.
umat sample is used to generate a HDF5 file of random sub-selections of a provided image, useful for creating a training dataset.
each sample is represented as a top-level dataset in the resulting HDF5 file.
umat addlab takes an ImageJ generated ROI file and adds it to a specified sample HDF5 group from a umat sample-generated file.
umat retrain uses a umat sample-generated file with added labels (using umat addlab) to fine-tune an existing cellpose model.
umat spot generates cell by gene matrix without using any prior cell segmentation, instead binning all transcripts into "pseudo-spots"
umat fromproseg generates a umat-compatible masks file (in either npy or zarr format) from the cell-polygons-layer GEOJSON output file generated by proseg, allowing for further processing by umat (e.g. umat signals or umat preview) of proseg-generated outputs.
to facilitate use of the segmentation pipeline on HPC infrastructure (assuming SLURM use for scheduling) a set of scripts are provided under the scripts/slurm subdirectory, providing a complete segmentation pipeline.
the recommended manner of use is to invoke one of the scripts under scripts/slurm/run, which will use the scripts under scripts/slurm/batch to set up a series of SLURM jobs to run cell segmentation.