Skip to content

TransmissibleCancerGroup/DepthOfCoverage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

copynumber_calling_pipeline

Summary

Nextflow pipeline for computing depth of coverage / logR from transmissible cancer samples.

Dependencies

This pipeline requires Nextflow, and some container engine like Docker or Singularity.

There is a Docker container providing all other dependencies:

docker pull kg8422/copynumber_calling_pipeline:latest,

or a Singularity container:

singularity pull library://kgori/nextflow/copynumber_calling_pipeline.

Usage

Provide a folder with BAM/CRAM files as input, e.g. data/:

Provide an samtools faidx indexed reference genome FASTA file.

Provide a "metadata" CSV/TSV file with the following columns:

  • tumour = tumour sample name (to match the SM: field in the BAM header)
  • host = host sample name (to match the SM: field in the BAM header)
  • hostSex = M or F, indicating the host sex. Required for calculating X/Y chromosome logR.
  • excludedFromPanel = TRUE or FALSE. Use to exclude noisy host samples from the normalising step. Expect most to be FALSE.
  • tumourContaminated = TRUE or FALSE. Indicates if the host is contaminated with tumour DNA, and will ignore that host from the analysis. Expect most to be FALSE.

tumour and host on the same line of metadata.tsv should be paired / matched.

Run the pipeline with

nextflow run main.nf \
  --metadata METADATA.tsv \
  --reference REFERENCE.fa \
  --inputDir bams_folder \
  --outputDir results_folder \
  -resume

An example nextflow.config is provided.

Results

The results are written as an Arrow database. This can be read and interacted with in R using the R arrow package. Or in python using the pyarrow package.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors