From aa9dd3a63fcafda3e7707da7edb278f1e4bc0700 Mon Sep 17 00:00:00 2001
From: FerriolCalvet <ferriolcalvet@gmail.com>
Date: Thu, 22 Jan 2026 14:18:59 +0100
Subject: [PATCH 1/4] add DeepClone page to pipelines

---
 docs/pipelines/DeepClone.md | 138 ++++++++++++++++++++++++++++++++++++
 1 file changed, 138 insertions(+)
 create mode 100644 docs/pipelines/DeepClone.md

diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md
new file mode 100644
index 00000000..a47b80c0
--- /dev/null
+++ b/docs/pipelines/DeepClone.md
@@ -0,0 +1,138 @@
+# DeepClone pipelines
+
+## deepUMIcaller
+
+## Metrics
+
+## deepCSA
+
+<!-- 
+TODO: Brief introduction on what is intogen - its website and its purpose, use webs and repo as reference.
+-->
+[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/bbglab/intogen-plus-dsl2/)<!-- markdownlint-disable MD013 -->
+
+It's a framework for automatic and comprehensive knowledge extraction based on mutational data from
+sequenced tumor samples from patients.
+
+## Run IntOGen DSL2
+
+Great effort was put to migrate IntOGen from nextflow DSL1 to nextflow DSL2. This effort allowed to be able to run the
+pipeline within our seqera platform dashboard.
+
+From the bbglabirb/ALP_pipelines workspace [launchpad](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad),
+you can access the pipelines available in our workspace.
+
+!!! question "I can't see the workspace, what should I do?"
+    Please refer to Miguel or to Federica to solve this issue
+
+By clicking on [intOGen-plus-dsl2](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad/217132460501467?sourceWorkspaceId=97012242959019)
+you'll be able to launch the pipeline.
+
+![alt text](../assets/images/intogen-dsl2/intogen_seqera.png)
+
+Before launching the pipeline, some parameters need to be configured. Here a simple but complete list of
+useful parameters is explained.
+
+!!! warning "We highly recommend to keep the defaults for those parameters not discussed in this page."
+
+=== "General config section"
+
+    #### **Revision number**<!-- markdownlint-disable MD046 -->
+    
+    ![Revision number](../assets/images/intogen-dsl2/revision_number.png){ height="300" style="display: block; margin: 0 auto" }
+
+    By default, the **revision number** is linked to the stable tag of the pipeline. As of now - it's `2024.11-dsl2`. 
+    This can eventually be changed if a run is resumed or relaunched from the run section.
+
+    !!! note "Please be aware that changing this section may affect the `resume` option"
+
+    #### **Config profile**
+
+    ![Config profile](../assets/images/intogen-dsl2/config_profile.png){ height="300" style="display: block; margin: 0 auto" }
+
+    - `test` --> this is using the [CBIOP cohort](https://github.com/bbglab/intogen-plus-dsl2/blob/dev/DSL2/tests/data/pipeline/input/cbioportal_prad_broad/data_mutations_extended.txt) in the repo [optional]<!-- markdownlint-disable MD013 -->
+    - `test_full` --> this is using the full datasets of intogen [optional].
+    - `singularity` --> this is allowing the use of singularity for using the containers
+    - `irb` --> this is allocating the right resources and queue for the slurm executor in the IRBCluster
+
+    #### **Workflow run name**
+
+    ![Run name](../assets/images/intogen-dsl2/workflow_name.png){ height="300" style="display: block; margin: 0 auto" }
+
+    It's **mandatory** to write a meaningful name. Here follows some examples:
+
+    - If I am running a new combination optimization I would call the run: `optimization_combination`
+    - If I am running a FULL run with a new final version of intogen I would call it: `v3.0_ALL`
+    - If I am reproducing the v2024 run I would call it: `v2024_ALL`
+    - If I am running a specific cohort from an external collaborator I would call it: `v2024_EXT_COLLAB`
+
+    #### **Work directory**
+
+    ![work directory](../assets/images/intogen-dsl2/work_dir.png){ height="300" style="display: block; margin: 0 auto" }
+    
+    By default, the work directory is `/data/bbg/nobackup2/work/IntOGenDSL2/v2024/`.
+    For faster execution you can use the scratch partition in the cluster: `/scratch/bbg/work/IntOGenDSL2/v2025/<your-subfolder>`.
+    Replace `<your-subfolder>` with a meaningful name, such as the `Outdir` value from the next section, to avoid conflicts.
+    
+    !!! warning "Delete the work folder once the intogen run finishes successfully."
+    
+
+=== "Run parameters section"
+
+    #### **Input**<!-- markdownlint-disable MD046 -->
+
+    This parameter is read as a string, and it should be the absolute paths of the folder that openvariant will iterate
+    separated by a space. Here it follows an example:
+
+    ```sh
+    /path/to/datasets/for/intogen/input1 /path/to/datasets/for/intogen/input2 /path/to/datasets/for/intogen/input3
+    ```
+
+    !!! question "How do I prepare the input for IntOGen?"
+        Great question! Here the documentation where everything is explained: 
+        [intogen-plus.readthedocs](https://intogen-plus.readthedocs.io/en/v2024/usage.html#input)
+
+    #### **Outdir**
+
+    This parameter is where the output of intogen will be stored. By default we store
+    intermediate runs that might fail here:
+
+    ```sh
+    /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/<MeaningfulName>
+    ```
+
+    !!! note "It's important to add a meaningful name as a final directory output"
+        by default IntOGen will create a folder with a date where all the results will be stored. This although
+        requires an higher level of specificity in the top folder.
+
+        e.g. If I am running an external collab for LUNG data, I will add as an `outdir` parameter:
+        ```sh
+        /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab
+        ```
+        
+        The IntOGen pipeline will by default create a subdirectory with the date of the
+        launch where it will store all the files:
+        ```sh
+        /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab/20250423/
+        ```
+
+
+    Stable runs and releases are officially stored in a safer partition: 
+    ```sh
+    /data/bbg/datasets/intogen/output/runs
+    ```
+
+Once both those sections are completed we are safe to run the pipeline.
+
+### FAQs
+
+!!! question "The pipeline failed. How do I resume?"
+    In the [run tab](https://cloud.seqera.io/orgs/bbglabirb/workspaces/bbglab/watch) click on the three
+    dots on the right of your run and click `Resume`.
+
+- TBC
+
+## References
+
+- Federica Brando
+- Miguel Grau

From 605aeff8e2c5c9cdd82b88db11e08f60cf4af147 Mon Sep 17 00:00:00 2001
From: FerriolCalvet <ferriolcalvet@gmail.com>
Date: Thu, 22 Jan 2026 14:42:56 +0100
Subject: [PATCH 2/4] update deepclone sections for deep* pipelines

---
 docs/pipelines/DeepClone.md | 149 ++++++++----------------------------
 1 file changed, 34 insertions(+), 115 deletions(-)

diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md
index a47b80c0..9d654774 100644
--- a/docs/pipelines/DeepClone.md
+++ b/docs/pipelines/DeepClone.md
@@ -1,138 +1,57 @@
 # DeepClone pipelines
 
-## deepUMIcaller
-
-## Metrics
-
-## deepCSA
-
-<!-- 
-TODO: Brief introduction on what is intogen - its website and its purpose, use webs and repo as reference.
--->
-[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/bbglab/intogen-plus-dsl2/)<!-- markdownlint-disable MD013 -->
-
-It's a framework for automatic and comprehensive knowledge extraction based on mutational data from
-sequenced tumor samples from patients.
-
-## Run IntOGen DSL2
-
-Great effort was put to migrate IntOGen from nextflow DSL1 to nextflow DSL2. This effort allowed to be able to run the
-pipeline within our seqera platform dashboard.
-
-From the bbglabirb/ALP_pipelines workspace [launchpad](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad),
-you can access the pipelines available in our workspace.
-
-!!! question "I can't see the workspace, what should I do?"
-    Please refer to Miguel or to Federica to solve this issue
-
-By clicking on [intOGen-plus-dsl2](https://cloud.seqera.io/orgs/bbglabirb/workspaces/ALP_pipelines/launchpad/217132460501467?sourceWorkspaceId=97012242959019)
-you'll be able to launch the pipeline.
-
-![alt text](../assets/images/intogen-dsl2/intogen_seqera.png)
-
-Before launching the pipeline, some parameters need to be configured. Here a simple but complete list of
-useful parameters is explained.
-
-!!! warning "We highly recommend to keep the defaults for those parameters not discussed in this page."
+## Introduction
 
-=== "General config section"
+This page is to summarize the usage of the pipelines and tools used within the context of DeepClone.
+The main steps are: duplex library preparation protocol, deepUMIcaller, the generation of duplex metrics and deepCSA.
 
-    #### **Revision number**<!-- markdownlint-disable MD046 -->
-    
-    ![Revision number](../assets/images/intogen-dsl2/revision_number.png){ height="300" style="display: block; margin: 0 auto" }
+The documentation and basic information regarding DeepClone can be found in the protocols paper that can be found here:
+[protocols.io link](https://www.protocols.io/view/deepclone-an-end-to-end-protocol-to-study-somatic-dm6gp1jodgzp/v2)
 
-    By default, the **revision number** is linked to the stable tag of the pipeline. As of now - it's `2024.11-dsl2`. 
-    This can eventually be changed if a run is resumed or relaunched from the run section.
+You will find the basic list of steps in the website and also the main version of the manuscript and then you can check
+for a more detailed explanation of all the steps in the supplementary document also available in protocols.io.
 
-    !!! note "Please be aware that changing this section may affect the `resume` option"
+There are some internal definitions on how we use the pipelines but the access to this information is restricted and
+should be requested internally to the PROMINENT team.
 
-    #### **Config profile**
+## Duplex protocol
 
-    ![Config profile](../assets/images/intogen-dsl2/config_profile.png){ height="300" style="display: block; margin: 0 auto" }
+The steps are described in the protocol, and there is an alternative and more useful version of it in the supplementary material.
+We recommend users to use the supplementary material one.
 
-    - `test` --> this is using the [CBIOP cohort](https://github.com/bbglab/intogen-plus-dsl2/blob/dev/DSL2/tests/data/pipeline/input/cbioportal_prad_broad/data_mutations_extended.txt) in the repo [optional]<!-- markdownlint-disable MD013 -->
-    - `test_full` --> this is using the full datasets of intogen [optional].
-    - `singularity` --> this is allowing the use of singularity for using the containers
-    - `irb` --> this is allocating the right resources and queue for the slurm executor in the IRBCluster
-
-    #### **Workflow run name**
-
-    ![Run name](../assets/images/intogen-dsl2/workflow_name.png){ height="300" style="display: block; margin: 0 auto" }
-
-    It's **mandatory** to write a meaningful name. Here follows some examples:
-
-    - If I am running a new combination optimization I would call the run: `optimization_combination`
-    - If I am running a FULL run with a new final version of intogen I would call it: `v3.0_ALL`
-    - If I am reproducing the v2024 run I would call it: `v2024_ALL`
-    - If I am running a specific cohort from an external collaborator I would call it: `v2024_EXT_COLLAB`
-
-    #### **Work directory**
-
-    ![work directory](../assets/images/intogen-dsl2/work_dir.png){ height="300" style="display: block; margin: 0 auto" }
-    
-    By default, the work directory is `/data/bbg/nobackup2/work/IntOGenDSL2/v2024/`.
-    For faster execution you can use the scratch partition in the cluster: `/scratch/bbg/work/IntOGenDSL2/v2025/<your-subfolder>`.
-    Replace `<your-subfolder>` with a meaningful name, such as the `Outdir` value from the next section, to avoid conflicts.
-    
-    !!! warning "Delete the work folder once the intogen run finishes successfully."
-    
-
-=== "Run parameters section"
-
-    #### **Input**<!-- markdownlint-disable MD046 -->
-
-    This parameter is read as a string, and it should be the absolute paths of the folder that openvariant will iterate
-    separated by a space. Here it follows an example:
-
-    ```sh
-    /path/to/datasets/for/intogen/input1 /path/to/datasets/for/intogen/input2 /path/to/datasets/for/intogen/input3
-    ```
+## deepUMIcaller
 
-    !!! question "How do I prepare the input for IntOGen?"
-        Great question! Here the documentation where everything is explained: 
-        [intogen-plus.readthedocs](https://intogen-plus.readthedocs.io/en/v2024/usage.html#input)
+We use the code available in [deepUMIcaller](https://github.com/bbglab/deepUMIcaller.git), we generally use the dev branch
+since this contains the most updated version of the code and is generally stable.
 
-    #### **Outdir**
+We run it via Seqera platform so that we have full record of the runs and coordination of the different projects.
 
-    This parameter is where the output of intogen will be stored. By default we store
-    intermediate runs that might fail here:
+We always put the work directory in /scratch and the outputs can either go to the s3 or to nobackup or nobackup2.
 
-    ```sh
-    /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/<MeaningfulName>
-    ```
+## Metrics
 
-    !!! note "It's important to add a meaningful name as a final directory output"
-        by default IntOGen will create a folder with a date where all the results will be stored. This although
-        requires an higher level of specificity in the top folder.
+## deepCSA
 
-        e.g. If I am running an external collab for LUNG data, I will add as an `outdir` parameter:
-        ```sh
-        /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab
-        ```
-        
-        The IntOGen pipeline will by default create a subdirectory with the date of the
-        launch where it will store all the files:
-        ```sh
-        /data/bbg/nobackup2/scratch/intogen_dev_tests/dev-DSL2/v2024/Lung_external_collab/20250423/
-        ```
+We use the code available in [deepCSA](https://github.com/bbglab/deepCSA.git), we generally use the dev branch
+since this contains the most updated version of the code and is generally stable.
 
+We run it via Seqera platform so that we have full record of the runs and coordination of the different projects.
 
-    Stable runs and releases are officially stored in a safer partition: 
-    ```sh
-    /data/bbg/datasets/intogen/output/runs
-    ```
+Add the irbcluster profile when running the pipeline so that the default structural parameters are automatically set.
 
-Once both those sections are completed we are safe to run the pipeline.
+We always put the work directory in /scratch and the outputs usually in nobackup or nobackup2.
 
-### FAQs
+## References
 
-!!! question "The pipeline failed. How do I resume?"
-    In the [run tab](https://cloud.seqera.io/orgs/bbglabirb/workspaces/bbglab/watch) click on the three
-    dots on the right of your run and click `Resume`.
+Duplex library prep. protocol:
 
-- TBC
+- Morena Pinheiro
+- Erika López-Arribillaga
+- Nuría Samper
 
-## References
+Computational pipelines:
 
-- Federica Brando
-- Miguel Grau
+- Ferriol Calvet (main developer)
+- Elisabet Figuerola (owns extensive internal documentation)
+- Rocío Chamorro (in particular for metrics)
+- Miguel Grau (developer)

From 3ae809b47727df621396c1a0bb176ee6d0a2d5f4 Mon Sep 17 00:00:00 2001
From: FerriolCalvet <ferriolcalvet@gmail.com>
Date: Thu, 22 Jan 2026 14:45:30 +0100
Subject: [PATCH 3/4] add mention to s3

---
 docs/pipelines/DeepClone.md | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md
index 9d654774..d5d92e44 100644
--- a/docs/pipelines/DeepClone.md
+++ b/docs/pipelines/DeepClone.md
@@ -28,6 +28,9 @@ We run it via Seqera platform so that we have full record of the runs and coordi
 
 We always put the work directory in /scratch and the outputs can either go to the s3 or to nobackup or nobackup2.
 
+If you have to access the s3 either for saving concats or for storing the output of deepUMIcaller there,
+check the [S3 entry](https://bbglab.github.io/bbgwiki/Cluster_basics/s3/#terminal) in this wiki.
+
 ## Metrics
 
 ## deepCSA

From 8d883f5ce43257a9940b7f482fb8f86d22216232 Mon Sep 17 00:00:00 2001
From: rochamorro1 <rchamorrogonzalez@gmail.com>
Date: Thu, 22 Jan 2026 15:30:39 +0100
Subject: [PATCH 4/4] add metrics section

---
 docs/pipelines/DeepClone.md | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/docs/pipelines/DeepClone.md b/docs/pipelines/DeepClone.md
index d5d92e44..d74f2e3d 100644
--- a/docs/pipelines/DeepClone.md
+++ b/docs/pipelines/DeepClone.md
@@ -33,6 +33,22 @@ check the [S3 entry](https://bbglab.github.io/bbgwiki/Cluster_basics/s3/#termina
 
 ## Metrics
 
+These are a set of metrics that help us understand two key aspects: whether our duplex libraries have properly worked and importantly to estimate how much sequencing output should be requested to avoid undersequencing but most importantly oversequencing.  
+
+*When should I run metrics*
+
+Everytime you do a new deepUMIcaller run and before running deepCSA. 
+
+*Why?* 
+
+1. To validate you have included all GBs of data available for that library (all lanes and reseqs)
+2. To check whether you library has been sequenced to optimal or additional reseq needs to be requested 
+3. To continue the effort of compiling these metrics to keep improving our understanding of the duplex protocol
+
+*How*
+
+You can find the instructions on how to run them and additional documentation on metrics in our internal duplex documentation.
+
 ## deepCSA
 
 We use the code available in [deepCSA](https://github.com/bbglab/deepCSA.git), we generally use the dev branch