From aa9dd3a63fcafda3e7707da7edb278f1e4bc0700 Mon Sep 17 00:00:00 2001 From: FerriolCalvet Date: Thu, 22 Jan 2026 14:18:59 +0100 Subject: [PATCH 1/4] add DeepClone page to pipelines --- docs/pipelines/DeepClone.md | 138 ++++++++++++++++++++++++++++++++++++ 1 file changed, 138 insertions(+) create mode 100644 docs/pipelines/DeepClone.md diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md new file mode 100644 index 00000000..a47b80c0 --- /dev/null +++ b/docs/pipelines/DeepClone.md @@ -0,0 +1,138 @@ +# DeepClone pipelines + +## deepUMIcaller + +## Metrics + +## deepCSA + + +[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/bbglab/intogen-plus-dsl2/) + +It's a framework for automatic and comprehensive knowledge extraction based on mutational data from +sequenced tumor samples from patients. + +## Run IntOGen DSL2 + +Great effort was put to migrate IntOGen from nextflow DSL1 to nextflow DSL2. This effort allowed to be able to run the +pipeline within our seqera platform dashboard. + +From the bbglabirb/ALP_pipelines workspace [launchpad](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad), +you can access the pipelines available in our workspace. + +!!! question "I can't see the workspace, what should I do?" + Please refer to Miguel or to Federica to solve this issue + +By clicking on [intOGen-plus-dsl2](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad/217132460501467?sourceWorkspaceId=97012242959019) +you'll be able to launch the pipeline. + +![alt text](../assets/images/intogen-dsl2/intogen_seqera.png) + +Before launching the pipeline, some parameters need to be configured. Here a simple but complete list of +useful parameters is explained. + +!!! warning "We highly recommend to keep the defaults for those parameters not discussed in this page." + +=== "General config section" + + #### **Revision number** + + ![Revision number](../assets/images/intogen-dsl2/revision_number.png){ height="300" style="display: block; margin: 0 auto" } + + By default, the **revision number** is linked to the stable tag of the pipeline. As of now - it's `2024.11-dsl2`. + This can eventually be changed if a run is resumed or relaunched from the run section. + + !!! note "Please be aware that changing this section may affect the `resume` option" + + #### **Config profile** + + ![Config profile](../assets/images/intogen-dsl2/config_profile.png){ height="300" style="display: block; margin: 0 auto" } + + - `test` --> this is using the [CBIOP cohort](https://github.com/bbglab/intogen-plus-dsl2/blob/dev/DSL2/tests/data/pipeline/input/cbioportal_prad_broad/data_mutations_extended.txt) in the repo [optional] + - `test_full` --> this is using the full datasets of intogen [optional]. + - `singularity` --> this is allowing the use of singularity for using the containers + - `irb` --> this is allocating the right resources and queue for the slurm executor in the IRBCluster + + #### **Workflow run name** + + ![Run name](../assets/images/intogen-dsl2/workflow_name.png){ height="300" style="display: block; margin: 0 auto" } + + It's **mandatory** to write a meaningful name. Here follows some examples: + + - If I am running a new combination optimization I would call the run: `optimization_combination` + - If I am running a FULL run with a new final version of intogen I would call it: `v3.0_ALL` + - If I am reproducing the v2024 run I would call it: `v2024_ALL` + - If I am running a specific cohort from an external collaborator I would call it: `v2024_EXT_COLLAB` + + #### **Work directory** + + ![work directory](../assets/images/intogen-dsl2/work_dir.png){ height="300" style="display: block; margin: 0 auto" } + + By default, the work directory is `/data/bbg/nobackup2/work/IntOGenDSL2/v2024/`. + For faster execution you can use the scratch partition in the cluster: `/scratch/bbg/work/IntOGenDSL2/v2025/`. + Replace `` with a meaningful name, such as the `Outdir` value from the next section, to avoid conflicts. + + !!! warning "Delete the work folder once the intogen run finishes successfully." + + +=== "Run parameters section" + + #### **Input** + + This parameter is read as a string, and it should be the absolute paths of the folder that openvariant will iterate + separated by a space. Here it follows an example: + + ```sh + /path/to/datasets/for/intogen/input1 /path/to/datasets/for/intogen/input2 /path/to/datasets/for/intogen/input3 + ``` + + !!! question "How do I prepare the input for IntOGen?" + Great question! Here the documentation where everything is explained: + [intogen-plus.readthedocs](https://intogen-plus.readthedocs.io/en/v2024/usage.html#input) + + #### **Outdir** + + This parameter is where the output of intogen will be stored. By default we store + intermediate runs that might fail here: + + ```sh + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/ + ``` + + !!! note "It's important to add a meaningful name as a final directory output" + by default IntOGen will create a folder with a date where all the results will be stored. This although + requires an higher level of specificity in the top folder. + + e.g. If I am running an external collab for LUNG data, I will add as an `outdir` parameter: + ```sh + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab + ``` + + The IntOGen pipeline will by default create a subdirectory with the date of the + launch where it will store all the files: + ```sh + /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab/20250423/ + ``` + + + Stable runs and releases are officially stored in a safer partition: + ```sh + /data/bbg/datasets/intogen/output/runs + ``` + +Once both those sections are completed we are safe to run the pipeline. + +### FAQs + +!!! question "The pipeline failed. How do I resume?" + In the [run tab](https://cloud.seqera.io/orgs/bbglabirb/workspaces/bbglab/watch) click on the three + dots on the right of your run and click `Resume`. + +- TBC + +## References + +- Federica Brando +- Miguel Grau From 605aeff8e2c5c9cdd82b88db11e08f60cf4af147 Mon Sep 17 00:00:00 2001 From: FerriolCalvet Date: Thu, 22 Jan 2026 14:42:56 +0100 Subject: [PATCH 2/4] update deepclone sections for deep* pipelines --- docs/pipelines/DeepClone.md | 149 ++++++++---------------------------- 1 file changed, 34 insertions(+), 115 deletions(-) diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md index a47b80c0..9d654774 100644 --- a/docs/pipelines/DeepClone.md +++ b/docs/pipelines/DeepClone.md @@ -1,138 +1,57 @@ # DeepClone pipelines -## deepUMIcaller - -## Metrics - -## deepCSA - - -[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/bbglab/intogen-plus-dsl2/) - -It's a framework for automatic and comprehensive knowledge extraction based on mutational data from -sequenced tumor samples from patients. - -## Run IntOGen DSL2 - -Great effort was put to migrate IntOGen from nextflow DSL1 to nextflow DSL2. This effort allowed to be able to run the -pipeline within our seqera platform dashboard. - -From the bbglabirb/ALP_pipelines workspace [launchpad](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad), -you can access the pipelines available in our workspace. - -!!! question "I can't see the workspace, what should I do?" - Please refer to Miguel or to Federica to solve this issue - -By clicking on [intOGen-plus-dsl2](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad/217132460501467?sourceWorkspaceId=97012242959019) -you'll be able to launch the pipeline. - -![alt text](../assets/images/intogen-dsl2/intogen_seqera.png) - -Before launching the pipeline, some parameters need to be configured. Here a simple but complete list of -useful parameters is explained. - -!!! warning "We highly recommend to keep the defaults for those parameters not discussed in this page." +## Introduction -=== "General config section" +This page is to summarize the usage of the pipelines and tools used within the context of DeepClone. +The main steps are: duplex library preparation protocol, deepUMIcaller, the generation of duplex metrics and deepCSA. - #### **Revision number** - - ![Revision number](../assets/images/intogen-dsl2/revision_number.png){ height="300" style="display: block; margin: 0 auto" } +The documentation and basic information regarding DeepClone can be found in the protocols paper that can be found here: +[protocols.io link](https://www.protocols.io/view/deepclone-an-end-to-end-protocol-to-study-somatic-dm6gp1jodgzp/v2) - By default, the **revision number** is linked to the stable tag of the pipeline. As of now - it's `2024.11-dsl2`. - This can eventually be changed if a run is resumed or relaunched from the run section. +You will find the basic list of steps in the website and also the main version of the manuscript and then you can check +for a more detailed explanation of all the steps in the supplementary document also available in protocols.io. - !!! note "Please be aware that changing this section may affect the `resume` option" +There are some internal definitions on how we use the pipelines but the access to this information is restricted and +should be requested internally to the PROMINENT team. - #### **Config profile** +## Duplex protocol - ![Config profile](../assets/images/intogen-dsl2/config_profile.png){ height="300" style="display: block; margin: 0 auto" } +The steps are described in the protocol, and there is an alternative and more useful version of it in the supplementary material. +We recommend users to use the supplementary material one. - - `test` --> this is using the [CBIOP cohort](https://github.com/bbglab/intogen-plus-dsl2/blob/dev/DSL2/tests/data/pipeline/input/cbioportal_prad_broad/data_mutations_extended.txt) in the repo [optional] - - `test_full` --> this is using the full datasets of intogen [optional]. - - `singularity` --> this is allowing the use of singularity for using the containers - - `irb` --> this is allocating the right resources and queue for the slurm executor in the IRBCluster - - #### **Workflow run name** - - ![Run name](../assets/images/intogen-dsl2/workflow_name.png){ height="300" style="display: block; margin: 0 auto" } - - It's **mandatory** to write a meaningful name. Here follows some examples: - - - If I am running a new combination optimization I would call the run: `optimization_combination` - - If I am running a FULL run with a new final version of intogen I would call it: `v3.0_ALL` - - If I am reproducing the v2024 run I would call it: `v2024_ALL` - - If I am running a specific cohort from an external collaborator I would call it: `v2024_EXT_COLLAB` - - #### **Work directory** - - ![work directory](../assets/images/intogen-dsl2/work_dir.png){ height="300" style="display: block; margin: 0 auto" } - - By default, the work directory is `/data/bbg/nobackup2/work/IntOGenDSL2/v2024/`. - For faster execution you can use the scratch partition in the cluster: `/scratch/bbg/work/IntOGenDSL2/v2025/`. - Replace `` with a meaningful name, such as the `Outdir` value from the next section, to avoid conflicts. - - !!! warning "Delete the work folder once the intogen run finishes successfully." - - -=== "Run parameters section" - - #### **Input** - - This parameter is read as a string, and it should be the absolute paths of the folder that openvariant will iterate - separated by a space. Here it follows an example: - - ```sh - /path/to/datasets/for/intogen/input1 /path/to/datasets/for/intogen/input2 /path/to/datasets/for/intogen/input3 - ``` +## deepUMIcaller - !!! question "How do I prepare the input for IntOGen?" - Great question! Here the documentation where everything is explained: - [intogen-plus.readthedocs](https://intogen-plus.readthedocs.io/en/v2024/usage.html#input) +We use the code available in [deepUMIcaller](https://github.com/bbglab/deepUMIcaller.git), we generally use the dev branch +since this contains the most updated version of the code and is generally stable. - #### **Outdir** +We run it via Seqera platform so that we have full record of the runs and coordination of the different projects. - This parameter is where the output of intogen will be stored. By default we store - intermediate runs that might fail here: +We always put the work directory in /scratch and the outputs can either go to the s3 or to nobackup or nobackup2. - ```sh - /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/ - ``` +## Metrics - !!! note "It's important to add a meaningful name as a final directory output" - by default IntOGen will create a folder with a date where all the results will be stored. This although - requires an higher level of specificity in the top folder. +## deepCSA - e.g. If I am running an external collab for LUNG data, I will add as an `outdir` parameter: - ```sh - /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab - ``` - - The IntOGen pipeline will by default create a subdirectory with the date of the - launch where it will store all the files: - ```sh - /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab/20250423/ - ``` +We use the code available in [deepCSA](https://github.com/bbglab/deepCSA.git), we generally use the dev branch +since this contains the most updated version of the code and is generally stable. +We run it via Seqera platform so that we have full record of the runs and coordination of the different projects. - Stable runs and releases are officially stored in a safer partition: - ```sh - /data/bbg/datasets/intogen/output/runs - ``` +Add the irbcluster profile when running the pipeline so that the default structural parameters are automatically set. -Once both those sections are completed we are safe to run the pipeline. +We always put the work directory in /scratch and the outputs usually in nobackup or nobackup2. -### FAQs +## References -!!! question "The pipeline failed. How do I resume?" - In the [run tab](https://cloud.seqera.io/orgs/bbglabirb/workspaces/bbglab/watch) click on the three - dots on the right of your run and click `Resume`. +Duplex library prep. protocol: -- TBC +- Morena Pinheiro +- Erika López-Arribillaga +- Nuría Samper -## References +Computational pipelines: -- Federica Brando -- Miguel Grau +- Ferriol Calvet (main developer) +- Elisabet Figuerola (owns extensive internal documentation) +- Rocío Chamorro (in particular for metrics) +- Miguel Grau (developer) From 3ae809b47727df621396c1a0bb176ee6d0a2d5f4 Mon Sep 17 00:00:00 2001 From: FerriolCalvet Date: Thu, 22 Jan 2026 14:45:30 +0100 Subject: [PATCH 3/4] add mention to s3 --- docs/pipelines/DeepClone.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md index 9d654774..d5d92e44 100644 --- a/docs/pipelines/DeepClone.md +++ b/docs/pipelines/DeepClone.md @@ -28,6 +28,9 @@ We run it via Seqera platform so that we have full record of the runs and coordi We always put the work directory in /scratch and the outputs can either go to the s3 or to nobackup or nobackup2. +If you have to access the s3 either for saving concats or for storing the output of deepUMIcaller there, +check the [S3 entry](https://bbglab.github.io/bbgwiki/Cluster_basics/s3/#terminal) in this wiki. + ## Metrics ## deepCSA From 8d883f5ce43257a9940b7f482fb8f86d22216232 Mon Sep 17 00:00:00 2001 From: rochamorro1 Date: Thu, 22 Jan 2026 15:30:39 +0100 Subject: [PATCH 4/4] add metrics section --- docs/pipelines/DeepClone.md | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md index d5d92e44..d74f2e3d 100644 --- a/docs/pipelines/DeepClone.md +++ b/docs/pipelines/DeepClone.md @@ -33,6 +33,22 @@ check the [S3 entry](https://bbglab.github.io/bbgwiki/Cluster_basics/s3/#termina ## Metrics +These are a set of metrics that help us understand two key aspects: whether our duplex libraries have properly worked and importantly to estimate how much sequencing output should be requested to avoid undersequencing but most importantly oversequencing. + +*When should I run metrics* + +Everytime you do a new deepUMIcaller run and before running deepCSA. + +*Why?* + +1. To validate you have included all GBs of data available for that library (all lanes and reseqs) +2. To check whether you library has been sequenced to optimal or additional reseq needs to be requested +3. To continue the effort of compiling these metrics to keep improving our understanding of the duplex protocol + +*How* + +You can find the instructions on how to run them and additional documentation on metrics in our internal duplex documentation. + ## deepCSA We use the code available in [deepCSA](https://github.com/bbglab/deepCSA.git), we generally use the dev branch