diff --git a/README.md b/README.md index adc056b..a3b29db 100644 --- a/README.md +++ b/README.md @@ -21,12 +21,15 @@ Please refer to the [**documentation**](https://coscialab.github.io/openDVP/), p ## Installation -You will need at least Python 3.10 (or newer) installed on your system. +You will need Python 3.11 or 3.12 installed on your system. If you are new to creating Python environments, we suggest you use [uv](https://docs.astral.sh/uv/) or [pixi](https://pixi.sh/latest/). -Installation took 4 seconds (excluding download time). You can install openDVP via pip: +```bash +conda create --name opendvp -y python=3.12 +``` + ```bash pip install opendvp ``` diff --git a/docs/ContributionGuide.md b/docs/ContributionGuide.md new file mode 100644 index 0000000..0ea7478 --- /dev/null +++ b/docs/ContributionGuide.md @@ -0,0 +1,147 @@ +# Contribution Guide + +First off, thank you for considering contributing to openDVP! It's people like you that make open source so great. We welcome contributions of all kinds, from bug reports to new features. + +This guide will walk you through the process of setting up your development environment and submitting your first contribution. + +## Development workflow summary + +1. Fork the opendvp repository to your own GitHub account +2. Create a development environment +3. Create a new branch for your PR +4. Add your feature or bugfix to the codebase +5. Make sure all tests are passing +6. Ensure code style, and/or ensure documentation looks good. +7. Open a PR back to the main repository + +## Step 1: Fork the opendvp repository to your own GitHub account + +First, [fork the repository](https://github.com/CosciaLab/openDVP/fork) to your own GitHub account. +For more context about what forking means check: [Github Manual: Forking](https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/fork-a-repo) + +## Step 2: Create a development environment + +Clone your fork to your local machine: + +```bash +git clone https://github.com/YOUR_USERNAME/openDVP.git +cd openDVP + +# Add the main repository as a remote +git remote add upstream https://github.com/CosciaLab/opendvp.git +# this allows you to easily update your opendvp copy if new changes come to the main opendvp +``` + +Now you must setup a tool that will check your code when you commit it. The tool is called `pre-commit`, briefly, `pre-commit` will run a check based on rules found in `.pre-commit-config.yaml`. + +```bash +# run to install pre-commit +uv run pre-commit install +``` + +## Step 3: Create a new branch for your PR + +All development should occur in branches of your forked opendvp, each branch dedicated to a particular purpose: feature, bug, etc. +You can create a branch with: + +```bash +$ git checkout main # Starting from the main branch +$ git pull # Syncing with the repo +$ git switch -c {your-branch-name} # Making and changing to the new branch +``` + +
+Detailed command explanation + +The provided commands guide you through the process of creating a new branch to work on your feature or bug fix. + +`$ git checkout main` +This command changes your current location in the repository to the main branch. You're making sure you start from the most up-to-date, stable version of the code. + +`$ git pull` +This command downloads the latest changes from the remote repository to your local copy. This is a crucial step to ensure your local main branch is synchronized with the project's main branch before you start your work. This prevents merge conflicts later. + +`$ git switch -c {your-branch-name}` +This is a shorthand command that does two things: +switch -c: creates a new branch with the name you provide (in this case, {your-branch-name}). +switch: immediately changes to that new branch. +
+ +## Step 4: Add your feature or bugfix to the codebase + +Code away! + +## Step 5: Make sure all tests are passing + +Testing is very important. To ensure that all the functions are working, and our users won't be surprised by a nasty bug we program these tests. Look inside `opendvp/tests/` to see how it looks. We use [pytest](https://docs.pytest.org/en/stable/) to test opendvp. + +After you have added your magnificient code, you should also add some tests in their respective directories. If you have never written tests before, don't worry. I suggest you look at the other tests, look at the ideas in the dropdown, and perhaps ask your trusty LLM how to get started. + +
+ What to test in a new feature + +- If you’re not sure what to tests about your function, some ideas include: +- Are there arguments which conflict with each other? Check that if they are both passed, the function throws an error (see pytest.raises docs). +- Are there input values which should cause your function to error? +- Did you add a helpful error message that recommends better outputs? Check that that error message is actually thrown. +- Can you place bounds on the values returned by your function? +- Are there different input values which should generate equivalent output (e.g. if an array is sparse or dense)? +- Do you have arguments which should have orthogonal effects on the output? Check that they are independent. For example, if there is a flag for extended output, the base output should remain the same either way. +- Are you optimizing a method? Check that it’s results are the same as a gold standard implementation. + +
+ +Before pushing your efforts online you should run all tests, for quick function feedback you can run pytest for a single function: + +```bash +# this runs all tests +uv run pytest + +# this runs tests for a particular function +uv run pytest tests/io/test_DIANN_to_adata.py +``` + +## Step 6: Code style check, documentation check + +Since you already installed `pre-commit` every time you have committed it should have ran `ruff`. +We use `ruff` for code formatting and linting to ensure a consistent code style throughout the project. + +### Manual checks + +You can also run the formatter and linter manually: + +To format your code: +```bash +uv run ruff format . +``` + +To check for linting errors: +```bash +uv run ruff check . +``` + +The settings for these checks live inside `pyproject.toml`. + +### Building the Documentation + +Our documentation is built using [Sphinx](https://www.sphinx-doc.org/en/master/) and is located in the `docs/` directory. + +To build the documentation locally run: + +```bash +uv run --group docs sphinx-build -b html docs/ docs/_build/html/ +``` + +The generated HTML files will be in the `docs/_build/html` directory. You can open `index.html` in your browser to view the documentation. + +## Step 7: Open a PR back to the main repository + +Once you've made your changes and are happy with them, you're ready to submit a pull request. + +Go to your github branch website and click the big green button `Compare & Pull Request` + +Ensure that the **pull request** is from your fork to the `main` branch of the `CosciaLab/openDVP` repository. + +In your pull request description, please explain the changes you've made and why you've made them. If your pull request addresses an open issue, please link to it. Once you've submitted your pull request, our continuous integration (CI) system will automatically run the tests to make sure everything is working as expected. We will then review your contribution and provide feedback. + +Thank you for contributing to openDVP!! diff --git a/docs/Tutorials/T4_Segmask_to_shapes.ipynb b/docs/Tutorials/T4_Segmask_to_shapes.ipynb new file mode 100644 index 0000000..91c9d59 --- /dev/null +++ b/docs/Tutorials/T4_Segmask_to_shapes.ipynb @@ -0,0 +1,33 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e265477f", + "metadata": {}, + "source": [ + "# T4: Segmask to shapes" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "openDVP", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/Tutorials/T5_Thresholding_tutorial.ipynb b/docs/Tutorials/T5_Thresholding_tutorial.ipynb new file mode 100644 index 0000000..6d2627b --- /dev/null +++ b/docs/Tutorials/T5_Thresholding_tutorial.ipynb @@ -0,0 +1,33 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "e265477f", + "metadata": {}, + "source": [ + "# T5: Proteomics Integration" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "openDVP", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.11.13" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} diff --git a/docs/Workflows/Computational.md b/docs/Workflows/Computational.md deleted file mode 100644 index e69de29..0000000 diff --git a/docs/Workflows/openDVP_guide.md b/docs/Workflows/Computational/GettingStartedWithMCMICRO.md similarity index 53% rename from docs/Workflows/openDVP_guide.md rename to docs/Workflows/Computational/GettingStartedWithMCMICRO.md index fc712ed..6a328a6 100644 --- a/docs/Workflows/openDVP_guide.md +++ b/docs/Workflows/Computational/GettingStartedWithMCMICRO.md @@ -1,9 +1,16 @@ -# Setup imaging processing pipelines for the MDC +# Getting started with MCMICRO + +## MCMICRO + +MCMICRO is a wonderful pipeline developed by a group of image analysts. It runs on [Nextflow](https://www.nextflow.io/) which allows it to be run on any computational environment. It is powerful, modular, and scalable. Once setup it can be ran in parallel with as many files as your compute allows. + +For a conceptual overview read the paper [MCMICRO paper](https://www.nature.com/articles/s41592-021-01308-y), for a more technical overview I suggest you first read the documentation of the [MCMICRO website ](https://mcmicro.org/overview/) and define what steps are important for your workflow. ### Setting Expectations If you want to run an image processing and analyis pipeline, it will take some time to setup. Depending on your familiarity to things like HPC, Nextflow, Snakemake, Python; the amount of effort will vary. +This guide is for users that are not experienced with MCMICRO or HPC, so it goes step-by-step. In the end going through the documentation of the various softwares is the only way to explore all the possibilities. ### General tips @@ -11,11 +18,7 @@ Depending on your familiarity to things like HPC, Nextflow, Snakemake, Python; t - Asking LLMs for help is great, just be careful, some of these pipelines and packages are not very popular and there are tendencies to hallucinate. - Ask humans for help, but help with log files, tracebacks, and context. Just like LLMs the more context the better we understand your problem. -## MCMICRO - -MCMICRO is a wonderful pipeline developed by a group of image analysts. It runs on [Nextflow](https://www.nextflow.io/) which allows it to be run on any computational environment. It is powerful, modular, and scalable. Once setup it can be ran in parallel with as many files as your compute allows. - -For a conceptual overview read the paper [MCMICRO paper](https://www.nature.com/articles/s41592-021-01308-y), for a more technical overview I suggest you first read the documentation of the [MCMICRO website ](https://mcmicro.org/overview/) and define what steps are important for your workflow. +## How to setup in MacOS/Linux environment ## How to setup in the BIH HPC @@ -26,9 +29,13 @@ Please take the time to read and go through the documentation at [Getting Access
Check access is working, you see this message: -```bash +```console +ssh @hpc-login-1.cubi.bihealth.org +``` + +Output: -❯ ssh jnimoca_m@hpc-login-1.cubi.bihealth.org +```bash Welcome to the BIH HPC 4 Research Cluster! You are on a login node. @@ -43,8 +50,6 @@ HPC Access Portal: https://hpc-access.cubi.bihealth.org Last login: Thu Aug 7 12:54:25 2025 from 141.80.221.57 -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) -bash: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8) -[jnimoca_m@hpc-login-1 ~]$ - ```
@@ -54,6 +59,21 @@ Last login: Thu Aug 7 12:54:25 2025 from 141.80.221.57 Follow [BIH HPC instructions](https://hpc-docs.cubi.bihealth.org/how-to/service/file-exchange/) for using software to transfer files. Technically you should be able to connect from virtual machines. Sometimes there are issues with IP adresses, please contact IT for help. +Download FileZilla, WinSCP, or CyberDuck for easy file transfer. + +- MacOS: https://filezilla-project.org/ +- Windows: https://winscp.net/eng/download.php +- Both: https://cyberduck.io/ + +#### Important Settings + +- Use SFTP + +#### MDC Users login nodes + +- username_m@hpc-login-1.cubi.bihealth.org +- username_m@hpc-login-2.cubi.bihealth.org +
Check if it is working You are able to drag and drop files, and you see them in the command line. @@ -66,9 +86,12 @@ Familiarize yourself with how [storage works in the BIH HPC](https://hpc-docs.cu In the terminal, connect to the HPC. Go to your home directory and check what is there with `ls` +```console +ls +``` + ```bash # Your home directory should look something like this: -[jnimoca_m@hpc-login-1 ~]$ ls bin ondemand scratch work ``` @@ -78,33 +101,50 @@ If you need more storage space, bring it up in meeting or Mattermost. We might h ### Step 4: Create environment in the HPC to run nextflow pipelines -```python -# these are the problems +We will use **miniforge**(Conda analogue) to install these, please read the [Software Installation with Conda](https://hpc-docs.cubi.bihealth.org/best-practice/software-installation-with-conda/). + +#### Step 4.1: Activate interactive session -# (1) Java -[jnimoca_m@hpc-login-1 ~]$ java -version --bash: java: command not found -#(2) Nextflow -[jnimoca_m@hpc-login-1 ~]$ nextflow --bash: /data/cephfs-1/home/users/jnimoca_m/bin/nextflow: No such file or directory +```console +srun --mem=5G --pty bash -i ``` -We will use **miniforge**(Conda analogue) to install these, please read the [Software Installation with Conda](https://hpc-docs.cubi.bihealth.org/best-practice/software-installation-with-conda/). +#### Step 4.2: Download miniforge -```python -# ensure you are running an interactive session in the HPC -#this commands creates and interactive session with more computational resources -hpc-login-1:~$ srun --mem=5G --pty bash -i -hpc-cpu-123:~$ wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh -hpc-cpu-123:~$ bash Miniforge3-Linux-x86_64.sh -b -f -p $HOME/work/miniforge -hpc-cpu-123:~$ eval "$(/$HOME/work/miniforge/bin/conda shell.bash hook)" -hpc-cpu-123:~$ conda init -hpc-cpu-123:~$ conda config --set auto_activate_base false +```console +wget https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-Linux-x86_64.sh +``` -# these commands to ensure conda channel is not used -hpc-cpu-123:~$ conda config --add channels bioconda -hpc-cpu-123:~$ conda config --add channels conda-forge -hpc-cpu-123:~$ conda config --set channel_priority strict +#### Step 4.3: Install miniforge + +```console +bash Miniforge3-Linux-x86_64.sh -b -f -p $HOME/work/miniforge +``` + +#### Step 4.4: Make conda command accesible + +```console +eval "$(/$HOME/work/miniforge/bin/conda shell.bash hook)" +``` + +#### Step 4.5: Initialize conda + +```console +conda init +``` + +#### Step 4.6: Change conda configuration + +```console +conda config --set auto_activate_base false +``` + +```console +conda config --add channels bioconda conda-forge +``` + +```console +conda config --set channel_priority strict ``` Conda hopefully is properly installed. @@ -112,9 +152,13 @@ Conda hopefully is properly installed.
Check conda is working: -```bash +```console +conda +``` + +Should output the following: -[jnimoca_m@hpc-cpu-61 ~]$ conda +```bash usage: conda [-h] [-v] [--no-plugins] [-V] COMMAND ... conda is a tool for managing and deploying applications, environments and packages. @@ -152,60 +196,119 @@ commands: run Run an executable in a conda environment. search Search for packages and display associated information using the MatchSpec format. update (upgrade) Update conda packages to the latest compatible version. - ```
-Now we will install the necessary packages +### Step 4.7: Install the necessary packages -```python -# Create a conda environment called 'mcmicro' -hpc-cpu-123:~$ conda create -n mcmicro -y +Create a conda environment called 'mcmicro' -# this activates the environment -hpc-cpu-123:~$ conda activate mcmicro +```console +conda create -n mcmicro -y +``` + +This activates the environment -# this install java and singularity -hpc-cpu-123:~$ conda install openjdk singularity -y +```console +conda activate mcmicro ``` -Now we proceed to install **Nextflow** +This install java and singularity -```python -# Download and run nextflow executable -hpc-cpu-123:~$ curl -s https://get.nextflow.io | bash +```console +conda install openjdk singularity -y +``` + +#### Step 4.8: Install **Nextflow** -# Trust and make the file executable -hpc-cpu-123:~$ chmod +x nextflow +Download and run nextflow executable -# move the file somewhere your environment can find it -hpc-cpu-123:~$ mkdir -p $HOME/.local/bin/ -hpc-cpu-123:~$ mv nextflow $HOME/.local/bin/ +```console +curl -s https://get.nextflow.io | bash ``` -
-Check nextflow is working part 1 +Trust and make the file executable + +```console +chmod +x nextflow +``` + +Move the file somewhere your environment can find it + +```console +mkdir -p ~/.local/bin/ +``` + +Move it to the local bin + +```console +mv nextflow ~/.local/bin/ +``` + +#### Step 4.9: Check Nextflow is working + +```console +nextflow info +``` + +Output: ```bash -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ nextflow info - Version: 25.04.7 build 5955 - Created: 08-09-2025 13:29 UTC (15:29 CEST) - System: Linux 5.14.0-570.21.1.el9_6.x86_64 - Runtime: Groovy 4.0.26 on OpenJDK 64-Bit Server VM 22.0.1-internal-adhoc.conda.src - Encoding: UTF-8 (UTF-8) +Version: 25.04.7 build 5955 +Created: 08-09-2025 13:29 UTC (15:29 CEST) +System: Linux 5.14.0-570.21.1.el9_6.x86_64 +Runtime: Groovy 4.0.26 on OpenJDK 64-Bit Server VM 22.0.1-internal-adhoc.conda.src +Encoding: UTF-8 (UTF-8) # this means nextflow is found in your PATH and can be run ``` -
+#### Step 4.10: Tell Nextflow where to place files +To prevent nextflow from placing files where they do not go, we must add variables (that nextflow will look for) and the paths. For example, we tell nextflow that NXF_SINGULARITY_CACHEDIR (where to store singularity images) is `/data/cephfs-1/work/groups/coscia/Singularity_Cache`, therefore it will place those files there. -
-Check nextflow is working part 2: +First open your `.bashrc` file wiht: -```bash -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ nextflow run nextflow-io/hello +```console +nano ~/.bashrc +``` + +This will open a terminal text editor, add the following 5 lines of code. +Windows users you will not be able to copy paste, please be careful with typos. + +```console +export SINGULARITY_CACHEDIR=/data/cephfs-1/work/groups/coscia/Singularity_Cache +export NXF_SINGULARITY_CACHEDIR=/data/cephfs-1/work/groups/coscia/Singularity_Cache +export NXF_WORK=/data/cephfs-1/home/users/$USERNAME/scratch/ +export NXF_TEMP=/data/cephfs-1/home/users/$USERNAME/scratch/ +export NXF_HOME=/data/cephfs-1/home/users/$USERNAME/work/.nextflow +``` + +Restart the `.bashrc` file to trigger those changes + +```console +source ~/.bashrc +``` + +### Step 6: Run Nextflow + +In case something went wrong before, remember you must have, and do any of these if needed: + +- interactive session activated (no login nodes) : `srun --mem=5G --pty bash -i` +- mcmicro conda environment activated : `conda activate mcmicro` +- have the `.bashrc` variables set and activated + +#### Step 6.1: Demo pipeline +Run nextflow hello (demo) pipeline + +```console +nextflow run nextflow-io/hello +``` + +This is how it should look like + +```console N E X T F L O W ~ version 25.04.7 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [2ce0b0e294] @@ -220,23 +323,27 @@ Hello world! Ciao world! Hola world! - -# this means you have connection to download pipelines and run them - ```
-### Step 5: Run Nextflow with test data +#### Step 6.2: Download demo data (Exemplar-001) -From each code block, please run the commands one at a time. +Create a folder for demo data: + +```console +mkdir -p ~/work/test1/ +``` + +Download data using nextflow pipeline: ```bash -# create directory for test data -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ mkdir -p ~/work/test1/ -# download test data from internet to test directory -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ nextflow run labsyspharm/mcmicro/exemplar.nf --name exemplar-001 --path ~/work/test1/ +nextflow run labsyspharm/mcmicro/exemplar.nf --name exemplar-001 --path ~/work/test1/ +``` + +Output should look like this: +```console N E X T F L O W ~ version 25.04.7 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [b0175102db] @@ -253,11 +360,16 @@ CPU hours : 0.2 Succeeded : 7 ``` -```bash -# Run with -profile singularity, otherwise it defaults to Docker and it will fail. -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ nextflow run labsyspharm/mcmicro --in ~/work/test1/exemplar-001 -profile singularity +#### Step 6.3: Run MCMICRO with demo data - N E X T F L O W ~ version 25.04.7 +```console +nextflow run labsyspharm/mcmicro --in ~/work/test1/exemplar-001 -profile singularity +``` + +Output should look like this: + +```console +N E X T F L O W ~ version 25.04.7 NOTE: Your local project version looks outdated - a different revision is available in the remote repository [b0175102db] Launching `https://github.com/labsyspharm/mcmicro` [scruffy_aryabhata] DSL2 - revision: 9122980d88 [master] @@ -281,9 +393,13 @@ CPU hours : 0.3 Succeeded : 4 ``` +We can check the outputs of the pipeline: + +```console +tree ~/work/test1/exemplar-001 +``` + ```python -# Let's check the outputs -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ tree ~/work/test1/exemplar-001 . └── exemplar-001 ├── illumination @@ -331,67 +447,15 @@ Succeeded : 4
-
- Reasoning question: Why did it take so long to run that small (400mb) dataset? - -Because you ran that entire analysis in that single node, with the default resources. -When you run your dataset nextflow will dispatch jobs based on the requirements of each process. - -
- -### Step 6: How to run MCMICRO with HPC job - -Let's rerun the same dataset with a script - -```bash -# let's create the script file - -# go to demo data -(mcmicro) [jnimoca_m@hpc-cpu-213 ~]$ cd work/test1/exemplar-001/ -# create script file -(mcmicro) [jnimoca_m@hpc-cpu-213 exemplar-001]$ touch script.sh -# edit script -(mcmicro) [jnimoca_m@hpc-cpu-213 exemplar-001]$ nano script.sh -# then you type everything you need and press (CTRL+X) to leave -# consider creating this file in a text editor (VSCode) and copy paste into it -``` - -Here is the demo script: - -```bash -#!/bin/bash -#SBATCH --job-name=test_job # Job name -#SBATCH --time=4:00:00 # Time limit hrs:min:sec -#SBATCH --mem=10G # Memory for orchestrating node -#SBATCH --cpus-per-task=2 # Number of CPU cores for orchestrating node - -eval "$(conda shell.bash hook)" # This exposes conda hook to node -conda activate mcmicro # This activates environment in node -PATH_TO_DATA="/data/cephfs-1/home/users/jnimoca_m/work/test1/exemplar-001" -nextflow run labsyspharm/mcmicro --in $PATH_TO_DATA -profile singularity -``` +## Running MCMICRO with a script for distributed computing -Then run your script: +MCMICRO works best when we distribute all the tasks to the HPC. -```bash -#the location of the script doesnt matter as long as the path in it directs to your data -(mcmicro) [jnimoca_m@hpc-cpu-213 exemplar-001]$ sbatch demo_script.sh -sbatch: routed your job to partition short -Submitted batch job 18622303 -``` +You will need: -### Checklist - -- Ensure access to HPC -- Ensure you can move files into HPC environment (consider image sizes) -- Ensure enough storage space is available for your images and processing steps -- Ensure HPC can run **java** -- Ensure HPC can run **apptainer/singularity** images -- Ensure HPC can run **nextflow** -- Run **MCMICRO** [demo data](https://mcmicro.org/datasets/) directly on interactive session -- Run **MCMICRO** [demo data](https://mcmicro.org/datasets/) with a HPC script - -SUCCESS, you managed to install everything, now let's dig into the biology! +- raw image files +- param.yml file +- script.sh file ## How to prepare your images for MCMICRO @@ -399,18 +463,22 @@ Please familiarize yourself with [MCMICRO's Input/Output expectactions](https:// ### Project organization -Minimum requiremets are the following: +Minimum requirements are the following: ```bash myproject/ ├── markers.csv -├── params.yml +├── params.yml # actually not needed, but strongly recommended └── raw/ ``` + +Important: if a run of mcmicro creates new directories, and you would like not to start from the first step, use the `--resume` option in your `nextflow run` command. Nextflow will check what has already been done, and carry on from the last step. + ### markers.csv -- cannot be renamed +- **cannot** be renamed. - column names must be: `cycle_number`, `channel_number`, and `marker_name` +- optional column names are: `background`, `exposure`, and `remove` (for backsub). - `marker_name` values must all be unique - Separator **must** be a *comma* (,) -- Be careful with excel and german defaults, use libreOffice if needed. @@ -475,320 +543,66 @@ options: ashlar: --flip-y --align-channel 4 -m 50 --filter-sigma 1 ``` -### Considerations, ideas, and tips - -- You do not have to run everything all the time, sometimes I like to run MCMICRO just for (1) Illumination correction and (2) Stitching and registration and the move on to another software. Make use of the workflow options like `stop-at`. -- Inside the [nextflow.config](https://github.com/labsyspharm/mcmicro/blob/master/nextflow.config) file, MCMICRO developers link the default computational requirements for each module. They have set `profiles`. For example here you can find the [WSI defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/wsi.config) profile optimized for Whole slide imaging, and here the `profile` for [TMA defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/tma.config). If you feel you need your own computational requirements, I suggest you fork the MCMICRO github repository, modify it, and then run mcmicro like this `nextflow run josenimo/mcmicro`. Nextflow is smart and will use that github repository instead. -- For MDC peeps, feel free to use [josenimo/mcmicro](https://github.com/josenimo/mcmicro/) with `-profile singularity` in your `nextflow run` command. -- Nextflow has a nice `-resume` parameter to automatically check what already ran, and go on from there. - -## SOPA - -SOPA is a wonderful pipeline, similar to MCMICRO but has some fundamental differences. - -Similarly, for a overview of SOPA read the [paper](https://www.nature.com/articles/s41467-024-48981-z), for a more technical overview visit the [website](https://gustaveroussy.github.io/sopa/). - -- SOPA runs on [Snakemake](https://snakemake.github.io/), nextflow version is in [development](https://nf-co.re/sopa/usage). -- SOPA is easier to customize than MCMICRO. For adding a function quickly to your pipeline use this (Python required). -- SOPA can be used in four flavours: jupyter notebooks with python API, snakemake, nextflow, and CLI. -- SOPA uses the Spatialdata object natively, experience with SpatialData is recommended. -- SOPA is compatible with various modalities: Xenium, Visium HD, MERSCOPE, CosMx, PhenoCycler, MACSima, Molecular Cartography. - -### Run SOPA locally - -- Familiarize yourself with the [Snakemake framework](https://snakemake.github.io/). -- Follow SOPA [Getting Started](https://gustaveroussy.github.io/sopa/getting_started/) to install necessary packages. -- Follow SOPA's [snakemake tutorial](https://gustaveroussy.github.io/sopa/tutorials/snakemake/) +### script.sh -### Run SOPA on HPC +- The `.sh` suffix denotes a script file. +- The script will tell the HPC how to process your images. +- I suggest you create a script in your preferred IDE, like VSCode, and then import it to the HPC when it is time to run your script. -- Create environment with `snakemake`, `snakemake-slurm`, and `snakemake-executor-plugin-slurm`. -- Use profile that will use the executor plugin -- For MDC peeps check [Snakemake with Slurm](https://hpc-docs.cubi.bihealth.org/slurm/snakemake/) - -
- Example bash script for single snakemake run +Demo script: ```bash #!/bin/bash +#SBATCH --job-name=test_job # Job name +#SBATCH --time=16:00:00 # Time limit hrs:min:sec +#SBATCH --mem=10G # Memory for orchestrating node +#SBATCH --cpus-per-task=2 # Number of CPU cores for orchestrating node -#SBATCH --partition=medium -#SBATCH --job-name=smk_main_${job_prefix} -#SBATCH --ntasks=1 -#SBATCH --nodes=1 -#SBATCH --time=24:00:00 -#SBATCH --mem-per-cpu=250M -#SBATCH --output=slurm_logs/%x-%j.log - -FILE_PATH=$1 # The file path is passed as an argument to the script -filename=$(basename "$FILE_PATH") -job_prefix="${filename:0:4}" - -export SBATCH_DEFAULTS="--output=slurm_logs/%x-%j.log" +eval "$(conda shell.bash hook)" # This exposes conda hook to node +conda activate mcmicro # This activates environment in node +PATH_TO_DATA="/data/cephfs-1/home/users//work/test1/exemplar-001" +PATH_TO_PARAMS="/data/cephfs-1/home/users//work/test1/exemplar001_params.yml" -date +nextflow run labsyspharm/mcmicro --in $PATH_TO_DATA --params $PATH_TO_PARAMS -profile singularity +``` -source /data/cephfs-1/home/users/jnimoca_m/work/miniconda/etc/profile.d/conda.sh +Then run your script from the terminal -echo "Activate sopa env" -conda activate sopa +```console +sbatch demo_script.sh +``` -echo "Running snakemake with file $FILE_PATH" -srun snakemake \ ---config data_path="$FILE_PATH" \ ---use-conda -j 100 --profile=cubi-v1 +Output will look something like this: -date +```console +sbatch: routed your job to partition short +Submitted batch job 18622303 ``` -
-
- Example array script to run many images +Conceptually: +This script will start an orchestrating job, which will manage all the other jobs that have to be created and managed. +Every daughter job will be managed from it. If the orchestrating job runs out of time, everything collapses. +There are also ways one can run a batch script that can run one script per dataset. In that way processing multiple datasets from a single script. This is excellent for reproducibility. -```bash -files=( - "992_backsub.ome.tif" - "993_backsub.ome.tif" - "994_backsub.ome.tif" - "997_backsub.ome.tif" -) - -for file in "${files[@]}"; do - echo "Submitting job for file $file" - sbatch snakemake_run.sh "data/input/${file}" -done -``` -
+Looking for details? +This part can get very complex, and highly depends on your needs. So I refrain from overexplaining. Look at your HPC documentation on how to run scripts, and what are their recommended workflows. Asking your friendly bioinformatician is a great first point of help. +### Considerations, ideas, and tips +- You do not have to run everything all the time, sometimes I like to run MCMICRO just for (1) Illumination correction and (2) Stitching and registration and the move on to another software. Make use of the workflow options like `stop-at`. +- Inside the [nextflow.config](https://github.com/labsyspharm/mcmicro/blob/master/nextflow.config) file, MCMICRO developers link the default computational requirements for each module. They have set `profiles`. For example here you can find the [WSI defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/wsi.config) profile optimized for Whole slide imaging, and here the `profile` for [TMA defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/tma.config). If you feel you need your own computational requirements, I suggest you fork the MCMICRO github repository, modify it, and then run mcmicro like this `nextflow run josenimo/mcmicro`. Nextflow is smart and will use that github repository instead. +- For MDC peeps, feel free to use [josenimo/mcmicro](https://github.com/josenimo/mcmicro/) with `-profile singularity` in your `nextflow run` command. +- Nextflow has a nice `-resume` parameter to automatically check what already ran, and go on from there. + +### Checklist -
- Example Snakemake file; pipeline with every step and parameters +- Ensure access to HPC +- Ensure you can move files into HPC environment (consider image sizes) +- Ensure enough storage space is available for your images and processing steps +- Ensure HPC can run **java** +- Ensure HPC can run **apptainer/singularity** images +- Ensure HPC can run **nextflow** +- Run **MCMICRO** [demo data](https://mcmicro.org/datasets/) directly on interactive session +- Run **MCMICRO** [demo data](https://mcmicro.org/datasets/) with a HPC script -```python -configfile: "ome_tif.yaml" - -# version 2.1.0 -# date: 09.09.2024 -# comments: -# v2.0.0: increased time of quantification rule to 6 hours and changed partition to medium -# v2.1.0: added pyrimidize - -from utils import WorkflowPaths, Args - -paths = WorkflowPaths(config) -args = Args(paths, config) - -localrules: all - -rule all: - input: - f"data/quantification/{paths.sample_name}_quantification.csv", - f"data/processed_images/{paths.sample_name}_8bit_pyrimidized.ome.tif" - shell: - """ - echo 🎉 Successfully run sopa - echo → SpatialData output directory: {paths.sdata_path} - """ - -rule downscale_image: - input: - paths.data_path, - output: - path_downscaled_image = f"data/processed_images/{paths.sample_name}_8bit.tif" - conda: - "sopa" - resources: - mem_mb=256_000, - partition="short", - runtime="3h", - threads: 4 - shell: - """ - mkdir -p ./data/processed_images - - python scripts/downscale_image_to8bit.py \ - --input {paths.data_path} \ - --output {output.path_downscaled_image} \ - """ - -rule pyrimidize: - input: - processed_imaged = f"data/processed_images/{paths.sample_name}_8bit.tif" - output: - pyrimidized_image = f"data/processed_images/{paths.sample_name}_8bit_pyrimidized.ome.tif" - conda: - "sopa" - resources: - mem_mb=128_000, - partition="medium", - runtime="6h", - threads: 4 - params: - tilesize = config["pyramid"]["tile-size"], - shell: - """ - python scripts/pyramidize.py \ - --input {input.processed_imaged} \ - --output {output.pyrimidized_image} \ - --tile-size {params.tilesize} - """ - -rule to_spatialdata: - input: - paths.data_path if config["read"]["technology"] != "uniform" else [], - output: - paths.sdata_zgroup if paths.data_path else [], - conda: - "sopa" - resources: - mem_mb=128_000, - partition="short", - runtime="3h", - threads: 2 - params: - args_reader = str(args['read']) - shell: - """ - sopa read {paths.data_path} --sdata-path {paths.sdata_path} {params.args_reader} - """ - -checkpoint patchify_cellpose: - input: - paths.sdata_zgroup, - output: - patches_file = paths.smk_patches_file_image, - patches = touch(paths.smk_patches), - params: - args_patchify = str(args["patchify"].where(contains="pixel")), - conda: - "sopa" - resources: - mem_mb=32_000, - partition="short", - runtime="1h", - threads: 4 - shell: - """ - sopa patchify image {paths.sdata_path} {params.args_patchify} - """ - -rule patch_segmentation_cellpose: - input: - paths.smk_patches_file_image, - paths.smk_patches, - output: - paths.smk_cellpose_temp_dir / "{index}.parquet", - conda: - "sopa" - resources: - mem_mb=32_000, - partition="short", - runtime="1h", - threads: 4 - params: - args_cellpose = str(args["segmentation"]["cellpose"]), - shell: - """ - sopa segmentation cellpose {paths.sdata_path} --patch-dir {paths.smk_cellpose_temp_dir} --patch-index {wildcards.index} {params.args_cellpose} - """ - - -def get_input_resolve(name, dirs=False): - def _(wilcards): - with getattr(checkpoints, f"patchify_{name}").get(**wilcards).output.patches_file.open() as f: - return paths.cells_paths(f.read(), name, dirs=dirs) - return _ - -rule resolve_cellpose: - input: - get_input_resolve("cellpose"), - output: - touch(paths.smk_cellpose_boundaries), - conda: - "sopa" - resources: - mem_mb=32_000, - partition="short", - runtime="1h", - threads: 4 - shell: - """ - sopa resolve cellpose {paths.sdata_path} --patch-dir {paths.smk_cellpose_temp_dir} - """ - -rule rasterize: - input: - paths.sdata_zgroup, - paths.smk_cellpose_boundaries - output: - mask_tif=f"data/masks/{paths.sample_name}_mask.tif" - conda: - "sopa" - resources: - mem_mb=256_000, - partition="short", - runtime="1h", - threads: 4 - shell: - """ - mkdir -p ./data/masks - - python scripts/rasterize.py \ - --input {paths.sdata_path} \ - --output {output.mask_tif} - """ - -rule expand_markers: - input: - mask = f"data/masks/{paths.sample_name}_mask.tif" - output: - exp_mask = f"data/masks/{paths.sample_name}_mask_expanded.tif" - resources: - mem_mb=256_000, - partition="short", - runtime="1h", - threads: 4 - params: - pixels = config["expand"]["pixels"] - shell: - """ - python scripts/expand_mask_singlemask.py \ - --input {input.mask} \ - --output {output.exp_mask} \ - --pixels {params.pixels} \ - """ - -rule quantify: - input: - image = paths.data_path, - mask = f"data/masks/{paths.sample_name}_mask_expanded.tif", - markers = "data/input/markers.csv" - output: - quantification = f"data/quantification/{paths.sample_name}_quantification.csv" - conda: - "sopa" - resources: - mem_mb=256_000, - partition="medium", - runtime="6h", - threads: 4 - params: - math = config["quantify"]["math"], - quantile = config["quantify"]["quantile"], - shell: - """ - mkdir -p ./data/quantification - - python scripts/quant.py \ - --image {input.image} \ - --label {input.mask} \ - --markers {input.markers} \ - --output {output.quantification} \ - --math {params.math} \ - --quantile {params.quantile} - """ - -``` - -
\ No newline at end of file +SUCCESS, you managed to install everything, now let's dig into the biology! diff --git a/docs/Workflows/openDVP_guide_backup_b4_MDC.md b/docs/Workflows/Computational/GettingStartedWithSOPA.md similarity index 55% rename from docs/Workflows/openDVP_guide_backup_b4_MDC.md rename to docs/Workflows/Computational/GettingStartedWithSOPA.md index 305be4d..a952c6d 100644 --- a/docs/Workflows/openDVP_guide_backup_b4_MDC.md +++ b/docs/Workflows/Computational/GettingStartedWithSOPA.md @@ -1,125 +1,184 @@ -# Setup imaging processing pipelines for the MDC +# Getting started with SOPA -## Setting Expectations +## SOPA -If you want to run an image processing and analyis pipeline, it will take some time to setup. Depending on your familiarity to things like HPC, Nextflow, Snakemake, Python; the amount of effort will vary. +SOPA is a wonderful pipeline, similar to MCMICRO but has some fundamental differences. -### General tips +Similarly, for a overview of SOPA read the [paper](https://www.nature.com/articles/s41467-024-48981-z), for a more technical overview visit the [website](https://gustaveroussy.github.io/sopa/). -- Before running your full data, try things out with very small data(<100MB). Where the feedback loop is less than 2 minutes. This will allow you to play with settings, parameters, and lower the cost of failure. You will fail, and you will learn. -- Asking LLMs for help is great, just be careful, some of these pipelines and packages are not very popular and the tendency to hallucinate is greater. -- Ask humans for help, but help with log files, tracebacks, and context. Just like LLMs the more context the better we understand your problem. +### General: -## MCMICRO +- SOPA runs on [Snakemake](https://snakemake.github.io/), nextflow version is in [development](https://nf-co.re/sopa/usage). +- SOPA uses the Spatialdata object natively, some experience with SpatialData is preferred. -MCMICRO is a wonderful pipeline developed by a group of image analysts. It runs on [Nextflow](https://www.nextflow.io/) which allows it to be run on any computational environment. It is powerful, modular, and scalable. Once setup it can be ran in parallel with as many files as your compute allows. +### Advantages: -For a conceptual overview read the paper [MCMICRO paper](https://www.nature.com/articles/s41592-021-01308-y), for a more technical overview I suggest you first read the documentation of the [MCMICRO website ](https://mcmicro.org/overview/) and define what steps are important for your workflow. +- SOPA can run on Windows. +- SOPA is easier to customize than MCMICRO. For adding a function quickly to your pipeline use this (Python required). +- SOPA can be used in four flavours: jupyter notebooks with python API, snakemake, nextflow, and CLI. +- SOPA is compatible with various modalities: Xenium, Visium HD, MERSCOPE, CosMx, PhenoCycler, MACSima, Molecular Cartography. -## How to setup: simplified +### Disadvantages: -### Windows users +- More modality compatibility leads to complexity of setup. +- SOPA cannot perform **Illumination correction**, **Stitching and Registration**, or **Background Subtraction**.(Jose is considering helping out SOPA devs with these). +- Output is spatialdata object, not super friendly to get data out of. +- Segmentation output is `.geojson` not `.tif` (you can convert easily with spatialdata function). -- This will be a challenge because you will need [WSL subsystem](https://learn.microsoft.com/en-us/windows/wsl/install). This is not simple, consider running in your cluster, or friendly bioinformatician with MacOS or Linux. +## Setup SOPA locally in your computer (for testing subsets) -### MacOS and Linux +- Familiarize yourself with the [Snakemake framework](https://snakemake.github.io/). +- Follow SOPA [Getting Started](https://gustaveroussy.github.io/sopa/getting_started/). +- Follow SOPA's [snakemake tutorial](https://gustaveroussy.github.io/sopa/tutorials/snakemake/). -- Follow instructions in [MCMICRO Installation](https://mcmicro.org/tutorial/installation.html). -- You will need to install **Java** and **Docker** on your machine. +### Step 1: Ensure you have conda/mamba installed -### High Performance Cluster (HPC) +- Download micromamba from [Micromamba Releases](https://github.com/mamba-org/micromamba-releases/releases). You might have to click `show all 27 assets` to see the version you need. +- Check that you can create environments and download packages -- You need an environment with **Java**, follow HPC admin suggestions (conda works for me). -- Install [Singularity/Apptainer](https://docs.sylabs.io/guides/3.0/user-guide/quick_start.html), a Docker analogue for the HPC. -- Install [Nextflow](https://www.nextflow.io/), the pipeline manager. -- Familiarize yourself how to run scripts in the HPC. -- For example, in SLURM I would use `$ sbatch myscript.sh` -
- Bash Script Example +### Step 2: Create sopa environment - ```bash - #!/bin/bash - #SBATCH --job-name=P30E01_Tonsil # Job name - #SBATCH --output=P30E01.%j.out # Output file - #SBATCH --error=P30E01.%j.err # Error file - #SBATCH --time=24:00:00 # Time limit hrs:min:sec - #SBATCH --mem=10G # Memory per node - #SBATCH --cpus-per-task=2 # Number of CPU cores per task - #SBATCH --partition=medium # Partition/queue name (check with your HPC) +```console +conda create --name sopa python=3.12 +``` - eval "$(conda shell.bash hook)" # This exposes conda hook to node - conda activate mcmicro # This activates environment in node +```console +conda activate sopa +``` - PATH_TO_DATA="/data/cephfs-1/home/users/jnimoca_m/work/P30_SF_HNSCC/P30E01_Tonsil" +Install sopa (with cellpose extension) and snakemake - nextflow run labsyspharm/mcmicro --in $PATH_TO_DATA -profile singularity --params $PATH_TO_DATA/P30E01_params.yml +```console +pip install 'sopa[cellpose]' snakemake +``` - echo "--end--" +for some reason cellpose 4 breaks sopa, we have to install cellpose 3 - ``` +```console +pip install 'cellpose <4' +``` -
+### Step 3: Download sopa defaults -### Checklist +SOPA have many processes, and many processes are modality specific. You can concatenate them as you want (not simple, not super complex either). To simplify things for now, We will use SOPA-provided defaults for dealing with ome.tif -- Ensure access to HPC -- Ensure you can move files into HPC environment (consider image sizes) -- Ensure enough storage space is available for your images and processing steps -- Ensure HPC can run **java** -- Ensure HPC can run **apptainer/singularity** images -- Ensure HPC can run **nextflow** -- Run **MCMICRO** [demo data](https://mcmicro.org/datasets/), I suggest Exemplar001. You can download it with nextflow command. +In the terminal, go to directory you know and this will download the entire github repository. -### Optimize MCMICRO for your images +```bash +git clone https://github.com/gustaveroussy/sopa.git +``` -MCMICRO has many steps, each step has many parameters that can be optimized to your images. -All of these parameters should be passed to MCMICRO by means of a `params.yml` file. This is the file that you pass on to your `nextflow run` command. -
- Example params.yml file +### Step 4: Edit default parameters + +Open the following file inside the downloaded `sopa` directory: +`sopa/workflow/configs/misc/ome_tif.yaml` + +This is the parameter file that sets the parameters for all the processes. +For a deeper look into what each of these does check [parameter_guide](https://github.com/gustaveroussy/sopa/blob/main/workflow/config/example_commented.yaml) + +We will change the cellpose parameters to the following to make it work with our demo image. -```yml -workflow: -start-at: illumination -stop-at: registration +```yaml +# these are the settings you should have (ensure you save the file after you are done). +read: + technology: ome_tif -options: - ashlar: --flip-y --align-channel 1 -m 50 --filter-sigma 1 +patchify: + patch_width_pixel: 400 + patch_overlap_pixel: 40 + +segmentation: + cellpose: + model_type: "nuclei" + diameter: 25 + channels: [ "DAPI_bg" ] + flow_threshold: 2 + cellprob_threshold: -6 + min_area: 25 + gaussian_sigma: 1 + +aggregate: + aggregate_channels: true + min_intensity_ratio: 0.1 + expand_radius_ratio: 0.1 + +explorer: + ram_threshold_gb: 4 + pixel_size: 1 ``` -
-Please check the [MCMICRO website](https://mcmicro.org/parameters/) for all potential parameters. -I suggest for modules such as segmentation to run your segmentation locally with a subset of your images. +### Step 5: Run sopa with defaults -### Considerations, ideas, and tips +There are three paths you need to pass to the snakemake command: -- You do not have to run everything all the time, sometimes I like to run MCMICRO just for (1) Illumination correction and (2) Stitching and registration and the move on to another software. Make use of the workflow options like `stop-at`. -- Inside the [nextflow.config](https://github.com/labsyspharm/mcmicro/blob/master/nextflow.config) file, MCMICRO developers link the default computational requirements for each module. They have set `profiles`. For example here you can find the [WSI defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/wsi.config) profile optimized for Whole slide imaging, and here the `profile` for [TMA defaults](https://github.com/labsyspharm/mcmicro/blob/master/config/nf/tma.config). If you feel you need your own computational requirements, I suggest you fork the MCMICRO github repository, modify it, and then run mcmicro like this `nextflow run josenimo/mcmicro`. Nextflow is smart and will use that github repository instead. -- For MDC peeps, feel free to use [josenimo/mcmicro](https://github.com/josenimo/mcmicro/) with `-profile singularity` in your `nextflow run` command. -- Nextflow has a nice `-resume` parameter to automatically check what already ran, and go on from there. +1. path to your image +2. path to the edited config file +3. path to the workflow profile (provided by sopa download) -## SOPA +For the demo image copy: +`RD_Coscia/Jose/__TestDatasets/TD_01_verysmall_mIF.ome.tif` +and place it in here: +`sopa/data/TD_01_verysmall_mIF.ome.tif` -SOPA is a wonderful pipeline, similar to MCMICRO but has some fundamental differences. +SOPA command template -Similarly, for a overview of SOPA read the [paper](https://www.nature.com/articles/s41467-024-48981-z), for a more technical overview visit the [website](https://gustaveroussy.github.io/sopa/). +```bash +snakemake \ + --config data_path=$PATH_TO_IMAGE \ + --configfile=$PATH_TO_YAML \ + --workflow-profile ./workflow/profile/local \ + --cores 4 +``` -- SOPA runs on [Snakemake](https://snakemake.github.io/), nextflow version is in [development](https://nf-co.re/sopa/usage). -- SOPA is easier to customize than MCMICRO. For adding a function quickly to your pipeline use this (Python required). -- SOPA can be used in four flavours: jupyter notebooks with python API, snakemake, nextflow, and CLI. -- SOPA uses the Spatialdata object natively, experience with SpatialData is recommended. -- SOPA is compatible with various modalities: Xenium, Visium HD, MERSCOPE, CosMx, PhenoCycler, MACSima, Molecular Cartography. +I suggest you run the command from the sopa directory, and it will look something like this: + +```bash +snakemake \ + --config data_path=./data/TD_01_verysmall_mIF.ome.tif \ + --configfile=./config/misc/ome_tif.yaml \ + --workflow-profile ./workflow/profile/local \ + --cores 6 +``` -### Run SOPA locally +Note, for Windows you must replace: + +- < \\ > with a caret < ^ > for the traditional Command Prompt (cmd.exe) and a backtick <`> for PowerShell. +- you can also remove them and create a single line command in an editor, and then copy paste. -- Familiarize yourself with the [Snakemake framework](https://snakemake.github.io/). -- Follow SOPA [Getting Started](https://gustaveroussy.github.io/sopa/getting_started/) to install necessary packages. -- Follow SOPA's [snakemake tutorial](https://gustaveroussy.github.io/sopa/tutorials/snakemake/) -### Run SOPA on HPC +### Step 6: Check post-run -- Create environment with `snakemake`, `snakemake-slurm`, and `snakemake-executor-plugin-slurm`. -- Use profile that will use the executor plugin -- For MDC peeps check [Snakemake with Slurm](https://hpc-docs.cubi.bihealth.org/slurm/snakemake/) +Before: + +```console +. +├── command.txt +├── config +│   └── ome_tif.yaml +└── data + └── TD_01_verysmall_mIF.ome.tif +``` + +After: + +```console +. +├── command.txt +├── config +│   └── ome_tif.yaml +└── data + ├── TD_01_verysmall_mIF.ome.explorer + ├── TD_01_verysmall_mIF.ome.tif + └── TD_01_verysmall_mIF.ome.zarr +``` + +The `.explorer` file can be opened with the Xenium explorer software (free). +The `.zarr` file can be opened with `spatialdata`. + +## Run SOPA on HPC + +- Create a conda environment with `snakemake`, `snakemake-slurm`, and `snakemake-executor-plugin-slurm`. +- Use profile that will use the executor plugin check [Snakemake with Slurm](https://hpc-docs.cubi.bihealth.org/slurm/snakemake/)
Example bash script for single snakemake run @@ -173,12 +232,11 @@ for file in "${files[@]}"; do sbatch snakemake_run.sh "data/input/${file}" done ``` -
- +
- Example Snakemake file; pipeline with every step and parameters + Example custom snakemake file; pipeline with every step and parameters ```python configfile: "ome_tif.yaml" @@ -406,4 +464,6 @@ rule quantify: ``` -
\ No newline at end of file + + +Thank you for your interest! diff --git a/docs/Workflows/Computational/InstallOpenDVP.md b/docs/Workflows/Computational/InstallOpenDVP.md new file mode 100644 index 0000000..a61e63d --- /dev/null +++ b/docs/Workflows/Computational/InstallOpenDVP.md @@ -0,0 +1,150 @@ +# Install opendvp + +## Quick start + +```console +$ conda create --name opendvp -y python=3.12 +$ conda activate opendvp +$ pip install opendvp +``` + +or + +```bash +$ uv add opendvp +``` + +## Installing with conda/mamba + +Mamba is a fast, drop-in replacement for the conda package manager. It significantly speeds up installing packages and resolving environment dependencies, making it a great tool for any data scientist or Python developer. 🐍 + +This tutorial will guide you through installing Mamba on your system. + +If you are new here is a nice post explaining the main [Concepts](https://mamba.readthedocs.io/en/latest/user_guide/concepts.html#concepts) (<5min read) + +### Install conda/mamba environment manager + +1. Check and download the most recent `Conda-forge Installer` release for your OS here: [Downloads](https://conda-forge.org/download/). +2. Follow instructions on website for your OS +3. For Windows only: use the Miniforge Prompt +4. Run `conda init` +5. Run `conda install mamba -n base -c conda-forge` to install mamba + +### Install opendvp with conda + +```console +$ conda create --name opendvp -y python=3.12 +$ conda activate opendvp +$ pip install opendvp +``` + +### Test install + +```console +$ python +>>> import opendvp +>>> print(opendvp.__version__) +0.7.1 +``` + +Make sure you always activate the environment to use opendvp. + +## Install with uv + +Assuming that most proteomics analysts use R, I have made this small tutorial to get you started with environment creation in python. `uv` is an extremely fast Python package and project manager, it has many great features and it is a great skill to have if you need python for anything. Check their [documentation](https://docs.astral.sh/uv/). + +### Installing uv in Windows + +Use this line to download the latest stable `uv` version + +```powershell +powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" +``` + +Unfortunately, there are many things that can go wrong in this step, depending on your computer setup. I am afraid I cannot explain all of these. I suggest you ask ChatGPT for help :) + +### Installing uv in Linux and MacOS + +Use curl to download the script and execute it with sh: + +```bash +curl -LsSf https://astral.sh/uv/install.sh | sh +``` + +or brew + +```bash +brew install uv +``` + +### Check uv works by running `uv` in the command line + +```console +$ uv +An extremely fast Python package manager. + +Usage: uv [OPTIONS] + +Commands: + run Run a command or script + init Create a new project + add Add dependencies to the project + remove Remove dependencies from the project + version Read or update the project's version + sync Update the project's environment + lock Update the project's lockfile + ... (10 lines hidden) +``` + +### Install opendvp with `uv` + +1. Create a new directory. + +`uv` works by creating directory specific environments. Therefore you should create a new directory for each different project. This might seems like separating a lot of things, but will keep your projects tidy, and you should only have what you need for each specific project. + +2. Open directory in [VSCode](https://code.visualstudio.com/download) + +3. Use `uv` to create your python (3.12) environment + +```console +$ uv init --python 3.12 +Initialized project `temp` +``` + +4. Use `uv` to install opendvp + +```console +$ uv add opendvp +``` + +### Check opendvp is installed + +```bash +> uv pip show opendvp +Name: opendvp +Version: 0.7.1 +... (hidden 3 lines) +``` + +Showing you what version is installed where + +## Use openDVP with jupyter notebooks + +- Create a new jupyter notebook, or a new file with suffix `.ipynb` +- Choose `Select kernel` in VSCode, and pick the `Python environment` that matches your directory name. + +Try importing opendvp, it will take some time the first time you do this. + +```python +import opendvp as dvp +``` + +Use this to check the version from within python + +```python +print(dvp.__version__) +``` + +## Troubleshooting + +- Python version cannot yet be >=3.13 ; this will cause install to fail. Use python 3.11 or 3.12. \ No newline at end of file diff --git a/docs/Workflows/Computational/index.md b/docs/Workflows/Computational/index.md new file mode 100644 index 0000000..e144c19 --- /dev/null +++ b/docs/Workflows/Computational/index.md @@ -0,0 +1,29 @@ +# Computational + +## Introductory tutorials + +- [Get started with uv for python management](GettingStartedWithUV) + +## Image processing pipelines + +### Get started with MCMICRO + +- [Get started with using MCMICRO](GettingStartedWithMCMICRO) + +### Get started wih SOPA + +- [Get started with using SOPA](GettingStartedWithSOPA) + +### Other image processing approaches + +- scPortrait +- Harpy + +```{toctree} +:maxdepth: 2 +:hidden: + +InstallOpenDVP +GettingStartedWithMCMICRO +GettingStartedWithSOPA +``` diff --git a/docs/Workflows/Experimental/Experimental_Imaging.md b/docs/Workflows/Experimental/Experimental_Imaging.md new file mode 100644 index 0000000..7157a6b --- /dev/null +++ b/docs/Workflows/Experimental/Experimental_Imaging.md @@ -0,0 +1,77 @@ +# Imaging + +You can image with whatever technology suits your project best. In general most people use: + +## Imaging modalities + +- H&E: Quick, simple, cheap, and pathology friendly +- IHC: Targeted with antibody, does not require fluoresence +- IF: Targeted with antibodies, a single round can have up to 5 stains. +- mIF: IF with manual handling between cycles + +### H&E + +- Fast, cheap, and standard for pathologists world-wide +- Many deep learning models exist to predict features +- Fast to stain tissue, fast to image tissue + +Hematoxylin and Eosin (H&E) staining remains the gold standard for histological visualization and is widely used to guide laser microdissection (LMD) of specific regions within tissue sections. The staining provides high-contrast morphological information that enables the identification of cellular and extracellular structures with precision, making it particularly valuable for isolating defined histological compartments or specific cell populations. + +There is plenty of information about how to interpet H&E, pathologists have plenty of experience in annotating this kind of imaging. We are also seeing an increase in the number of available Pathology Foundational Models that can predict a variety of features from a simple H&E stain. + +Depending on the experimental goal, H&E staining enables: + +- Single-cell microdissection: Identification of individual cells based on nuclear morphology, cytoplasmic boundaries, or specific tissue localization. +- Microdissection of cellular clusters: Selection of small groups of phenotypically similar cells or histological niches, such as tumor margins or immune infiltrates. +- Acellular region collection: Targeting non-cellular compartments like extracellular matrix, necrotic zones, or stromal regions, which are visually distinct due to the eosinophilic staining pattern. + +### Immunohistochemistry + +- Antibody-based + enzymatic dying +- Brightfield imaging, therefore fast and low-storage cost. + +Immunohistochemistry (IHC) enhances laser microdissection (LMD) by providing molecular specificity in addition to morphological context. Through the use of antibodies targeting specific proteins, IHC allows precise visualization and isolation of cells or tissue regions defined by molecular phenotype—such as immune subsets, tumor subpopulations, or stromal compartments. This is particularly advantageous when morphological cues alone are insufficient to distinguish the cells of interest. Chromogenic detection methods, typically using DAB (brown) or AEC (red) substrates, produce permanent, high-contrast labeling compatible with standard light microscopy and microdissection workflows. + +### Immunofluorescence + +- Antibody-based +- Panel design is important to ensure that signal spillover is minimal + +Immunofluorescence (IF) enables multiplexed molecular visualization in tissue sections, allowing up to five distinct markers to be simultaneously detected using spectrally separated fluorophores. This approach combines molecular specificity with spatial context, making it particularly valuable for laser microdissection (LMD) of defined cell types, microenvironments, or rare subpopulations that cannot be reliably distinguished by morphology alone. Fluorescent labeling can target cell identity markers, signaling proteins, or extracellular components, offering rich contextual information while maintaining subcellular resolution. + +Note: Here you should consider creating calibration points by etching the membrane with the LMD. + +### Multiplex Immunofluorescence + +- Antibody-based +- Increased panel design complexity and cost +- Increased lab work complexity OR expensive apparatus +- Enables the phenotyping of many cell types, particularly immune cells. + +Multiplex immunofluorescence (mIF) enables the simultaneous visualization of numerous protein markers within a single tissue section, providing a detailed molecular map of cellular phenotypes and spatial relationships. Unlike conventional IF, mIF relies on iterative staining and imaging cycles or advanced multiplexing chemistries (e.g., tyramide signal amplification, DNA barcoding, or spectral unmixing) to expand marker capacity far beyond the limits of standard fluorophore sets. This makes it a powerful tool for defining complex tissue microenvironments and guiding laser microdissection (LMD) toward highly specific cellular niches or interaction zones identified by combinatorial marker expression patterns. + +Note: Here you should consider creating calibration points by etching the membrane with the LMD. + +## Microscopes and critical settings + +Microscopes vary plenty + +### Magnification + +- 10X good enough for FlashDVP ROI manual annotation (not single cells). +- 20X minimum for image analysis driven DVP or single cell manual annotations. +- 40X and above: Allows for more granular morphological and more detailed cell-based analysis. + +### Binning + +- 1x1 binning, highest resolution, largest file sizes +- 2x2 binning: half the resolution, 1/4 the file size, higher signal-to-noise ration + +### File formats + +If you plan to just perform manual annotations, your bottlenect is getting your image into QuPath. +QuPath can take in a large variety of formats, please consult the [Bioformats Compatibility Table](https://docs.openmicroscopy.org/bio-formats/5.8.2/supported-formats.html). + +If you plan to perform image-analysis, ensure that your software solution can digest the file format. Perhaps consult your friendly image analyst for clarification. + +### Protocols are coming (work in progress) \ No newline at end of file diff --git a/docs/Workflows/Experimental/Experimental_Tissue_and_Slides.md b/docs/Workflows/Experimental/Experimental_Tissue_and_Slides.md new file mode 100644 index 0000000..1595974 --- /dev/null +++ b/docs/Workflows/Experimental/Experimental_Tissue_and_Slides.md @@ -0,0 +1,66 @@ +# Tissue and Slides + +## Tissue + +Tissues are quite heterogeneous, and you must know your tissue to best adapt the workflow to it. +Here are some factors to consider when preparing for DVP. + +### Adhesion to membrane + +- Better adhesion will allows you to perform more cycles of multiplex immunofluorescence. +- Fatty tissues, like breast, will have less adhesion. + +### Autofluorescence + +- Endogenous Fluorophores (collagens, elastins, lipofuscins) +- Tissue type and composition, more connective tissue usually has more autofluorescence +- Certain tissues (e.g., liver, lung, brain) are inherently more autofluorescent. +- Older and fibrotic samples are worse than fresh or young tissue. +- FFPE usually has higher autofluorescence than frozen sections. +- Autofluorescence tends to overlap with FITC/Alexa488, less so with far-red dyes (Alexa647, Cy5). + +### Cell size + +- Large cell sizes provide more input material in single-cell DVP and pooled DVP. For example, hepatocytes are large and full of input material. +- Smaller cells would require pooling to readout the same number of proteins +- See [The dawn of single-cell proteomics.](https://www.nature.com/articles/s41592-023-01771-9/figures/1). + +### Cell Density + +- Denser tissues will have more z-axis noise (cells on top and below cell in focus). +- Segmentation is more challenging with denser tissues +- Single cell collection is prone to capturing overlapping cell material. + +### Tissue thickness + +- Our default is 5 micrometers thick +- Thicker cuts provide more input material for LCMS, but have noisier images +- Thinner cuts provide less input material, but have greater signal-to-noise images. + +## Slides + +Laser microdissection of tissue cannot be done on glass slides, they stick too well. +Therefore we collect tissue from slides that have a membrane that can be cut. +The properties of the membrane vary, and you should consider which slide to use. + +Leica offers two main formats: + +- Frame Slide, a metal frame with a membrane that "floats" in the tissue area. +- Glass Slides, a glass slide with a membrane that is separated by a thin air pocket. + +There are two main materials that make up the membrane: + +- PEN (polyethylene naphthalate) +- PPS (polyphenylene sulfide); less autofluorescent for IF applications. + +For a detailed read go to [Leica: consumables-for-laser-microdissection](https://www.leica-microsystems.com/science-lab/life-science/consumables-for-laser-microdissection/) + +### Preparing slides before imaging (work in progress) + +#### Increasing attachment with Poly-L-Lysine + +(work in progress) + +#### Etching calibration points on the slide + +(work in progress) \ No newline at end of file diff --git a/docs/Workflows/Experimental.md b/docs/Workflows/Experimental/index.md similarity index 94% rename from docs/Workflows/Experimental.md rename to docs/Workflows/Experimental/index.md index 69283f0..277b11f 100644 --- a/docs/Workflows/Experimental.md +++ b/docs/Workflows/Experimental/index.md @@ -38,6 +38,8 @@ You can image with whatever technology suits your project best. In general most ### Hematoxylin and Eosin + + ### Immunohistochemistry ### Immunofluorescence @@ -58,4 +60,10 @@ Note: Here you should consider creating calibration points by etching the membra ### Select regions of interest with QuPath to LMD -### +```{toctree} +:maxdepth: 2 +:hidden: + +Experimental_Imaging +Experimental_Tissue_and_Slides +``` diff --git a/docs/Workflows/index.md b/docs/Workflows/index.md index 4728c56..9006320 100644 --- a/docs/Workflows/index.md +++ b/docs/Workflows/index.md @@ -1,4 +1,4 @@ -# openDVP the framework +# openDVP the framework ## Introduction @@ -6,8 +6,8 @@ openDVP is a framework to empower users to perform spatial proteomics as easily We suggest two main workflows: - - flashDVP (optimized for speed) - - DVP (optimized for complexity) +- flashDVP (optimized for speed) +- DVP (optimized for complexity) ## flashDVP @@ -18,7 +18,7 @@ You require : - Images in which you can recognize tissue of interest - Laser Microdissection device, or someone willing to collaborate that has one. - LCMS setup, or someone willing to collaborate that has one ;) . - + Workflow is: 1. Acquire images of tissue of interest @@ -28,7 +28,6 @@ Workflow is: 5. Prepare samples and acquire proteomes via LCMS 6. Perform downstream proteomic data analysis - ## DVP Ready to explore the proteomics of more complex tissues? or you are planning a large-scale project that needs automation? openDVP can help you. @@ -48,3 +47,11 @@ openDVP highly recommends utilizing open-source image processing pipelines: MCMI Image analysis can vary, but openDVP can help you filter common artefacts such as cells by morphological or intensity features. Filter dropped out cells by calculating the ratio of marker intensity between cycles. It also enables a easy back and forth between user-friendly annotation software like QuPath to easily integrate collaborators insights into the analysis. We use scimap for phenotyping, but we suggest you compare between the released approaches, and use what fits your problem best, that is the beauty of open source. We will release more details soon :) + +```{toctree} +:maxdepth: 2 +:hidden: + +Experimental/index +Computational/index +``` diff --git a/docs/Workflows/uv_tutorial.md b/docs/Workflows/uv_tutorial.md deleted file mode 100644 index 5e4cfeb..0000000 --- a/docs/Workflows/uv_tutorial.md +++ /dev/null @@ -1,104 +0,0 @@ -# Getting started with uv - -Assuming that most proteomics analysts use R, I have made this small tutorial to get you started with environment creation in python, using the latest `uv` - -`uv` is an extremely fast Python package and project manager, it has many great features and it is a great skill to have if you need python for anything. Check their [documentation](https://docs.astral.sh/uv/). - -## How to install uv - -### Windows - -Use this line to download the latest stable `uv` version - -```powershell -powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" -``` - -Unfortunately, there are many things that can go wrong in this step, depending on your computer setup. I am afraid I cannot explain all of these. I suggest you ask ChatGPT for help :) - -### Linux and MacOS - -Use curl to download the script and execute it with sh: - -```bash -curl -LsSf https://astral.sh/uv/install.sh | sh -``` -or brew -```bash -brew install uv -``` - -## Check uv works by running `uv` in the command line - -```bash -❯ uv -An extremely fast Python package manager. - -Usage: uv [OPTIONS] - -Commands: - run Run a command or script - init Create a new project - add Add dependencies to the project - remove Remove dependencies from the project - version Read or update the project's version - sync Update the project's environment - lock Update the project's lockfile - export Export the project's lockfile to an alternate format - tree Display the project's dependency tree - tool Run and install commands provided by Python packages - python Manage Python versions and installations - pip Manage Python packages with a pip-compatible interface - venv Create a virtual environment - build Build Python packages into source distributions and wheels - publish Upload distributions to an index - cache Manage uv's cache - self Manage the uv executable - help Display documentation for a command -``` - -### Install opendvp with `uv` - -1. Create a new directory. - -`uv` works by creating directory specific environments. Therefore you should create a new directory for each different project. This might seems like separating a lot of things, but will keep your proyects tidy, and you should only have what you need for each specific project. - -2. Open directory in [VSCode](https://code.visualstudio.com/download) - -3. Use `uv` to create your python environment - -```python -> uv init -``` - -4. Use `uv` to install opendvp - -```python -> uv add opendvp -``` - -### Check opendvp is installed - -```bash -> uv pip list | grep opendvp -opendvp 0.7.0 -``` - -Showing you what version is installed. - -## Use openDVP with jupyter notebooks - -- Create a new jupyter notebook, or a new file with suffix `.ipynb` -- Choose `Select kernel` in VSCode, and pick the `Python environment` that matches your directory name. - -Try importing opendvp, it will take some time the first time you do this. - -```python -import opendvp as dvp -``` - -Use this to check the version from within python - -```python -print(dvp.__version__) -``` \ No newline at end of file diff --git a/docs/conf.py b/docs/conf.py index 57e5293..d21c506 100644 --- a/docs/conf.py +++ b/docs/conf.py @@ -33,6 +33,10 @@ "conf_py_path": "/docs/", } +html_theme_options = { + "navigation_depth": -1, +} + # -- General configuration --------------------------------------------------- extensions = [ diff --git a/docs/index.md b/docs/index.md index fa8dee5..785ce29 100644 --- a/docs/index.md +++ b/docs/index.md @@ -63,7 +63,6 @@ pip install git+https://github.com/CosciaLab/openDVP.git@main - [Tutorial 2: Downstream proteomics analysis](Tutorials/T2_DownstreamProteomics) - [Tutorial 3: Integration of imaging with proteomics](Tutorials/T3_ProteomicsIntegration) - ## Contact For questions about openDVP and the DVP workflow you are very welcome to post a message in the [discussion board](https://github.com/CosciaLab/openDVP/discussions). For issues with the software, please post issues on [Github Issues](https://github.com/CosciaLab/openDVP/issues). @@ -73,10 +72,11 @@ For questions about openDVP and the DVP workflow you are very welcome to post a Not yet available. ```{toctree} -:maxdepth: 2 +:maxdepth: 3 :hidden: api/index Tutorials/index Workflows/index +ContributionGuide ``` diff --git a/pyproject.toml b/pyproject.toml index b64020f..2ee4b3f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -9,9 +9,8 @@ authors = [{name = "Jose Nimo", email = "jose.nimo@mdc-berlin.de"}] maintainers = [{ name = "Jose Nimo", email = "jose.nimo@mdc-berlin.de" }] dependencies = [ - "anndata >=0.9.1, <0.11", "spatialdata == 0.4.0", - "napari-spatialdata ~= 0.5.5", + "napari-spatialdata == 0.5.5", "spatialdata-plot ~= 0.2.9 ", "ipykernel ~= 6.25", "scanpy >=1.11, <1.14", diff --git a/src/opendvp/io/DIANN_to_adata.py b/src/opendvp/io/DIANN_to_adata.py index e17c4ff..35f86e2 100644 --- a/src/opendvp/io/DIANN_to_adata.py +++ b/src/opendvp/io/DIANN_to_adata.py @@ -75,6 +75,9 @@ def DIANN_to_adata( else: sample_metadata = pd.DataFrame(index=rawdata.index) + # TODO report number of matching out of all rows + # TODO allow users to pass exhaustive metadata to subset of pg_matrix rows + # check sample_metadata filename_paths are unique, and matches df if set(sample_metadata.index) != set(rawdata.index): logger.warning("uniques from sample metadata and DIANN table do not match") diff --git a/uv.lock b/uv.lock index 180cc3f..f3e12b5 100644 --- a/uv.lock +++ b/uv.lock @@ -1832,7 +1832,7 @@ wheels = [ [[package]] name = "napari-spatialdata" -version = "0.5.6" +version = "0.5.5" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "anndata" }, @@ -1862,9 +1862,9 @@ dependencies = [ { name = "xarray" }, { name = "xarray-datatree" }, ] -sdist = { url = "https://files.pythonhosted.org/packages/6f/35/228c805ee96174ffb8b44214dbf3405af4095c5009b30f77d1601010759c/napari_spatialdata-0.5.6.tar.gz", hash = "sha256:794955161a57748de8ab3e306d3ca45a23229cba45ed794044cac433cf0bfdb1", size = 15526394, upload-time = "2025-04-21T18:49:14.836Z" } +sdist = { url = "https://files.pythonhosted.org/packages/63/ac/151215ff814b7f67646729924172952a3614a00063e9a2355ded0e9aa134/napari_spatialdata-0.5.5.tar.gz", hash = "sha256:a02cfe988620735bab2047f65b10b82fa890e34ac83ebc7a1e44581655640749", size = 15525479, upload-time = "2025-01-20T15:13:18.269Z" } wheels = [ - { url = "https://files.pythonhosted.org/packages/96/96/983979e7a81cdf624b3a41540fc78c9508c94eca3611cec95a5e6b7734f3/napari_spatialdata-0.5.6-py3-none-any.whl", hash = "sha256:0c052663d86d3735961709e3f6bd7a04bf5da19dc46b8fb63a8499a6d18ef12f", size = 102616, upload-time = "2025-04-21T18:49:12.396Z" }, + { url = "https://files.pythonhosted.org/packages/d3/53/b03ebf972c0af5a994ba2062cd1ac2397dcf6be11603eb9fb384790bded5/napari_spatialdata-0.5.5-py3-none-any.whl", hash = "sha256:d85214d73905d805c1b3f2912d33cad63c857b66ac62051ba0ff74892c2952e2", size = 102153, upload-time = "2025-01-20T15:13:16.056Z" }, ] [[package]] @@ -2134,7 +2134,6 @@ version = "0.7.1" source = { editable = "." } dependencies = [ { name = "adjusttext" }, - { name = "anndata" }, { name = "esda" }, { name = "gensim" }, { name = "ipykernel" }, @@ -2183,12 +2182,11 @@ docs = [ [package.metadata] requires-dist = [ { name = "adjusttext" }, - { name = "anndata", specifier = ">=0.9.1,<0.11" }, { name = "esda", specifier = ">=2.7,<2.8" }, { name = "gensim", specifier = ">=4.3.2" }, { name = "ipykernel", specifier = "~=6.25" }, { name = "loguru", specifier = "~=0.7.3" }, - { name = "napari-spatialdata", specifier = "~=0.5.5" }, + { name = "napari-spatialdata", specifier = "==0.5.5" }, { name = "perseuspy", specifier = ">=0.3.9,<0.4" }, { name = "pingouin", specifier = ">=0.5.5,<0.6" }, { name = "pyogrio", specifier = "~=0.11.1" },