Elembio · rosibaj · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026 · Apr 30, 2026
diff --git a/src/python/examples/segmentation_workbook/cpsam_segmentation.ipynb b/src/python/examples/segmentation_workbook/cpsam_segmentation.ipynb
@@ -0,0 +1,315 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "07ed918f",
+   "metadata": {},
+   "source": [
+    "# CellposeSAM segmentation across an AVITI24 cytoprofiling run\n",
+    "\n",
+    "Run [CellposeSAM (CPSAM)](https://github.com/mouseland/cellpose) across every tile in a Teton or Teton Atlas run and write the segmentation masks Cells2Stats expects. CPSAM is a general-purpose model that integrates the [Segment Anything Model](https://segment-anything.com/) architecture; use it when your cell type is not represented in the Element Biosciences model library or when the General Element Biosciences model produces poor results even after diameter tuning.\n",
+    "\n",
+    "This notebook is the companion to the [Custom segmentation tutorial](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/). Read the tutorial first for the full context: run-type identification, when to use CPSAM, and post-segmentation Cells2Stats re-run."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6fd24dd9",
+   "metadata": {},
+   "source": [
+    "## Prerequisites\n",
+    "\n",
+    "Confirm the following before running this notebook:\n",
+    "\n",
+    "- Your run is a **Teton** or **Teton Atlas** run. CPSAM requires the actin channel and fails on Cell Paint only runs because no actin `.tif` file exists. For Cell Paint only runs, use an Element Biosciences 2-channel model and follow [Run a tile evaluation](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/#tile-evaluation) and [Run full segmentation](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/#full-segmentation).\n",
+    "- You created the separate `cpsam` Python environment with Cellpose 4.x installed from the MouseLand GitHub HEAD. See [Set up the CPSAM environment](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/#cpsam-setup-env). Do not install Cellpose 4.x into your `cytoprofiling-seg` environment.\n",
+    "- The `cpsam` environment is selected as this notebook's kernel (top menu: **Kernel → Change kernel → CellposeSAM**).\n",
+    "- A GPU is available. CPSAM on CPU is prohibitively slow for full-run processing.\n",
+    "\n",
+    "> **First run downloads ~1.15 GB.** The first time you initialize the CPSAM model, Cellpose automatically downloads the model weights (~1.15 GB) from HuggingFace to `~/.cellpose/models/`. Subsequent runs use the cached weights and do not require an internet connection."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "06e16ef5",
+   "metadata": {},
+   "source": [
+    "## Step 1 — Import packages\n",
+    "\n",
+    "Load the imaging, numerics, and Cellpose packages used throughout the rest of the notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "48e24551",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import json\n",
+    "import os\n",
+    "\n",
+    "import numpy as np\n",
+    "import skimage\n",
+    "from cellpose import core, models, transforms\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9ce6d73e",
+   "metadata": {},
+   "source": [
+    "## Step 2 — Provide Input and Output Paths\n",
+    "\n",
+    "Set the two required paths. Use a fresh `output_location` per re-segmentation pass so CPSAM masks do not overwrite Element Biosciences masks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "24ab308f",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Edit both paths before running the rest of the notebook.\n",
+    "\n",
+    "# Path to your AVITI24 run output folder\n",
+    "run_directory   = r\"/path/to/your/Run/Output/Folder\"\n",
+    "\n",
+    "# Where to write the CPSAM segmentation mask outputs (must be a different folder)\n",
+    "output_location = r\"/path/to/your/Run/Output/Folder/Segmentation_Output\""
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "65c4954e",
+   "metadata": {},
+   "source": [
+    "## Step 3 — Confirm GPU and load CPSAM\n",
+    "\n",
+    "Verify that a GPU is available, then load the CPSAM model. The first time this cell runs, Cellpose downloads ~1.15 GB of model weights from HuggingFace to `~/.cellpose/models/`. Subsequent runs use the cached weights."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b567dd74",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if not core.use_gpu():\n",
+    "    print(\"WARNING: No GPU detected. Runtime will be very long (8+ hours for 12-well).\")\n",
+    "    print(\"Consider running on a GPU-equipped machine for production use.\")\n",
+    "else:\n",
+    "    print(\"GPU confirmed. Proceeding with CPSAM segmentation.\")\n",
+    "\n",
+    "# Load model. Downloads on first run (~1.15 GB).\n",
+    "model = models.CellposeModel(gpu=True)\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "32c8be23",
+   "metadata": {},
+   "source": [
+    "## Step 4 — Define the normalization helper\n",
+    "\n",
+    "`normalize_image` applies Cellpose's per-region normalization across 1824-pixel sub-tiles, matching the preprocessing used by the Element Biosciences segmentation workflow."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "8b14c6cf",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def normalize_image(image, region_size=1824):\n",
+    "    image_norm = np.zeros_like(image, np.single)\n",
+    "\n",
+    "    for xi in range(int(image.shape[1] / region_size)):\n",
+    "        for yi in range(int(image.shape[0] / region_size)):\n",
+    "            cropped = image[\n",
+    "                yi * region_size:(yi + 1) * region_size,\n",
+    "                xi * region_size:(xi + 1) * region_size,\n",
+    "            ]\n",
+    "            cropped = transforms.normalize_img(\n",
+    "                cropped.reshape(cropped.shape[0], cropped.shape[1], 1)\n",
+    "            ).reshape(cropped.shape[0], cropped.shape[1])\n",
+    "            image_norm[\n",
+    "                yi * region_size:(yi + 1) * region_size,\n",
+    "                xi * region_size:(xi + 1) * region_size,\n",
+    "            ] = cropped\n",
+    "\n",
+    "    return image_norm\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "18fa08aa",
+   "metadata": {},
+   "source": [
+    "## Step 5 — Build the tile list from `RunParameters.json`\n",
+    "\n",
+    "Read `RunParameters.json` to enumerate every well and tile in the run and build the `tile2well` map used by the segmentation loop. The cell prints the total tile count so you can confirm the workload before committing to Step 6."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d4e628cb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "with open(os.path.join(run_directory, \"RunParameters.json\")) as f:\n",
+    "    run_parameters = json.load(f)\n",
+    "\n",
+    "tile2well, tiles = {}, []\n",
+    "for well in run_parameters[\"Wells\"]:\n",
+    "    for tile in well[\"Tiles\"]:\n",
+    "        tile2well[tile[\"Name\"]] = well[\"WellLocation\"]\n",
+    "        tiles.append(tile[\"Name\"])\n",
+    "\n",
+    "print(f\"Total tiles to process: {len(tiles)}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "39ec5de5",
+   "metadata": {},
+   "source": [
+    "## Step 6 — Segment every tile and write masks\n",
+    "\n",
+    "Run CPSAM on each tile and write the cell and nuclear masks to `output_location/Well{well}/`. Unlike the Element Biosciences workflow, CPSAM uses a single model for every well, so no per-well model lookup is needed.\n",
+    "\n",
+    "To monitor progress, watch for the rolling `Done: ...` lines. Each line corresponds to one tile fully processed and saved. See the **Runtime expectations** table at the bottom of the notebook for typical wall-clock times."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "733015fe",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "flow_threshold      = 0.4\n",
+    "cellprob_threshold  = 0.0\n",
+    "tile_norm_blocksize = 0\n",
+    "\n",
+    "print(f\"Beginning segmentation across {len(tiles)} tiles\")\n",
+    "\n",
+    "for tile in tiles:\n",
+    "    well = tile2well[tile]\n",
+    "    os.makedirs(os.path.join(output_location, f\"Well{well}\"), exist_ok=True)\n",
+    "\n",
+    "    cell_image    = skimage.io.imread(\n",
+    "        os.path.join(run_directory, \"Projection\", f\"Well{well}\", f\"CP01_{tile}_Cell-Membrane.tif\")\n",
+    "    )\n",
+    "    nuclear_image = skimage.io.imread(\n",
+    "        os.path.join(run_directory, \"Projection\", f\"Well{well}\", f\"CP01_{tile}_Nucleus.tif\")\n",
+    "    )\n",
+    "    actin_image   = skimage.io.imread(\n",
+    "        os.path.join(run_directory, \"Projection\", f\"Well{well}\", f\"CP01_{tile}_Actin.tif\")\n",
+    "    )\n",
+    "\n",
+    "    cell_image    = normalize_image(cell_image)\n",
+    "    nuclear_image = normalize_image(nuclear_image)\n",
+    "    actin_image   = normalize_image(actin_image)\n",
+    "\n",
+    "    composite = np.zeros((cell_image.shape[0], cell_image.shape[1], 3))\n",
+    "    composite[:, :, 0] = cell_image\n",
+    "    composite[:, :, 1] = nuclear_image\n",
+    "    composite[:, :, 2] = actin_image\n",
+    "\n",
+    "    print(f\"Segmenting cell membrane \\u2014 tile: {tile}\")\n",
+    "    cell_mask, _, _ = model.eval(\n",
+    "        composite,\n",
+    "        batch_size=2,\n",
+    "        flow_threshold=flow_threshold,\n",
+    "        cellprob_threshold=cellprob_threshold,\n",
+    "        normalize={\"tile_norm_blocksize\": tile_norm_blocksize},\n",
+    "        resample=False,\n",
+    "    )\n",
+    "    cell_mask = cell_mask.astype(np.uint32)\n",
+    "\n",
+    "    print(f\"Segmenting nuclei \\u2014 tile: {tile}\")\n",
+    "    nuclear_mask, _, _ = model.eval(\n",
+    "        nuclear_image,\n",
+    "        batch_size=2,\n",
+    "        flow_threshold=flow_threshold,\n",
+    "        cellprob_threshold=cellprob_threshold,\n",
+    "        normalize={\"tile_norm_blocksize\": tile_norm_blocksize},\n",
+    "        resample=False,\n",
+    "    )\n",
+    "    binary_nuclei = nuclear_mask.copy()\n",
+    "    binary_nuclei[nuclear_mask > 0] = 1\n",
+    "\n",
+    "    skimage.io.imsave(\n",
+    "        os.path.join(output_location, f\"Well{well}\", f\"{tile}_Cell.tif\"),\n",
+    "        cell_mask.astype(np.uint16),\n",
+    "    )\n",
+    "    skimage.io.imsave(\n",
+    "        os.path.join(output_location, f\"Well{well}\", f\"{tile}_Nuclear.tif\"),\n",
+    "        binary_nuclei.astype(np.uint8),\n",
+    "    )\n",
+    "    print(f\"Done: {tile}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8dba8d14",
+   "metadata": {},
+   "source": [
+    "## Reference\n",
+    "\n",
+    "### Runtime expectations\n",
+    "\n",
+    "CPSAM processes every tile in the run. CPU runtimes are prohibitive; the table below assumes a GPU.\n",
+    "\n",
+    "| Plate format | Approximate tiles | GPU estimate |\n",
+    "| ------------ | ----------------- | ------------ |\n",
+    "| 1-well       | ~18 tiles         | ~30 minutes  |\n",
+    "| 12-well      | ~216 tiles        | ~6–10 hours |\n",
+    "| 48-well      | ~864 tiles        | ~24–36 hours |\n",
+    "\n",
+    "### Output files\n",
+    "\n",
+    "For each tile, the loop writes two files to your `output_location`:\n",
+    "\n",
+    "- `{tile}_Cell.tif`: a `uint16` label mask where each unique integer represents one segmented cell.\n",
+    "- `{tile}_Nuclear.tif`: a `uint8` binary mask where `0` indicates no nucleus and `1` indicates a nucleus is present.\n",
+    "\n",
+    "### Validation\n",
+    "\n",
+    "CPSAM does not produce a built-in quality metrics table. Verify outputs visually or by comparing cell and nucleus counts against the baseline you established in [Interpret results and choose a model](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/#interpret-results).\n",
+    "\n",
+    "### Third-party tool disclaimer\n",
+    "\n",
+    "CellposeSAM is provided by the [MouseLand open-source project](https://github.com/mouseland/cellpose) and is not affiliated with or endorsed by Element Biosciences. CPSAM has not been formally validated against AVITI24 cytoprofiling runs and results may vary. For CPSAM-specific issues, installation support, or model updates, refer to the [official MouseLand repository](https://github.com/mouseland/cellpose).\n",
+    "\n",
+    "After all tiles finish, re-run Cells2Stats with `--segmentation` pointing at `output_location` to regenerate the cell table. See [Re-run Cells2Stats for cell assignment](https://docs.elembio.io/docs/tutorials/cytoprofiling/custom-segmentation/#cell-assignment)."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "CellposeSAM",
+   "language": "python",
+   "name": "cpsam"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}