Skip to content

broadinstitute/DirectHRD

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DirectHRD

DirectHRD is an ultrasensitive scar-based classifier to detect HRD from low tumor fraction samples such as liquid biopsies using whole-genome sequencing (WGS). DirectHRD encompasses two components which are small Indel calling and HRD calling. This workspace contains code only for the HRD calling part. In theory, a user can use their favorite Indel caller for WGS data. However, we do recommend using CODECsuite for Indel calling.

Note

Minimum python version 3.9.21

To install DirectHRD

  1. git clone https://github.com/broadinstitute/DirectHRD.git
  2. cd DirectHRD && pip install .
  3. Then you would need to install reference genome for SigProfilerMatrixGenerator (see this page for details) to able to parse the Indel variants to ID83 format. Choices of reference genome includes GRCh37, GRCh38, mm9 and mm10.
    1. $ python
    2. from SigProfilerMatrixGenerator import install as genInstall
    3. genInstall.install('GRCh38', rsync=False, bash=True)

To use the Recommended Indel caller

CODECsuite is available here at https://github.com/broadinstitute/CODECsuite. For installation, please refer to that github page. The command line for running is as follow:

    CODECsuite/build/codec call -b ~{tumor_or_ctdna_bam} \
        -r ~{reference_fasta} \
        -L ~{eval_genome_bed} \
        -n ~{germline_bam} \
        -V ~{population_based_vcf} \
        -m 60 \
        -q 30 \
        -d 12 \
        -x 6 \
        -c 3 \
        -5 \
        -g 30 \
        -Q 0.5 \
        -B 0.6 \
        -Y 10 \
        -W 1 \
        -f 30 \
        -E 8 \
        -s \
        -I 1 \
        -R 1 \
        -u \
        -i ~{max_allele_frac} \
        -o ~{sample_id}

The three required input arguments are:

tumor_or_ctdna_bam: the sample that the HRD status is investiaged.

reference_fasta: In the paper, we use GRCh37 reference genome.

eval_genome_bed: a bed file contains regions under investigation. We recommand using GRCh37 high complexitiy regions

The germline_bam is highly recommand to have for the purpose of calling tumor-specific mutations but in the case that this is not avaialbe, user can omit this option.

The population_based_vcf is another good to have input file which can mitigate contamination and low germline depth. We recommand using the ALFA dataset from dbsnp https://www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/.

Note

You can convert the CODEC variant .txt file to VCF format by following the two simple steps.

HRD calling

In practice, a user can use any Indel caller such as Mutect2 or Strelka2 to call Indels. However, I do recommend post-filtering the Indel calls using a low comlexity filter such as genome in a bottle GRCh37 high complexity regions or GRCh38 high complexity regions

The first step of HRD calling invovling Indel classification to COSMIC ID83 format. We used COSMIC v3.2 and SigProfiler in the paper. The second step is the HRDscore prediction using a Multinomial Mixture Model (MMM).

To use the pacakge, run:

hrd-classifier indel_vcfs_folder -r GRCh38 -o output.tsv

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Packages

No packages published