Nextflow pipeline for computing depth of coverage / logR from transmissible cancer samples.
This pipeline requires Nextflow, and some container engine like Docker or Singularity.
There is a Docker container providing all other dependencies:
docker pull kg8422/copynumber_calling_pipeline:latest,
or a Singularity container:
singularity pull library://kgori/nextflow/copynumber_calling_pipeline.
Provide a folder with BAM/CRAM files as input, e.g. data/:
Provide an samtools faidx indexed reference genome FASTA file.
Provide a "metadata" CSV/TSV file with the following columns:
- tumour = tumour sample name (to match the SM: field in the BAM header)
- host = host sample name (to match the SM: field in the BAM header)
- hostSex = M or F, indicating the host sex. Required for calculating X/Y chromosome logR.
- excludedFromPanel = TRUE or FALSE. Use to exclude noisy host samples from the normalising step. Expect most to be FALSE.
- tumourContaminated = TRUE or FALSE. Indicates if the host is contaminated with tumour DNA, and will ignore that host from the analysis. Expect most to be FALSE.
tumour and host on the same line of metadata.tsv should be paired / matched.
Run the pipeline with
nextflow run main.nf \
--metadata METADATA.tsv \
--reference REFERENCE.fa \
--inputDir bams_folder \
--outputDir results_folder \
-resumeAn example nextflow.config is provided.
The results are written as an Arrow database. This can be read and interacted with in R using the R arrow package. Or in python using the pyarrow package.