Skip to content

Re Dev : To Do #20

@snewhouse

Description

@snewhouse

New Branch in GIT repn

  • make a new branch

f1000_dev
on image
/home/ubuntu/scratch/ngseasy

Openstack VM

  • space
  • send key to amos
  • 30+ CPU
  • max RAM
  • Volume : 4TB

Images

  • build images
  • build tool set
  • build one image with all tools

Get Genomes

  • hg19.fasta
  • hs37d5.fasta
  • GRCh38.p7.fasta
  • hs38DH.fasta
  • gatk resources bundles
17.05.2016
ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA_000001405.22_GRCh38.p7/GCA_000001405.22_GRCh38.p7_genomic.fna.gz

Get test data

  • small 30-150x data set

Index Genomes

  • bwa
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • snap
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • novoalign
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta
  • bowtie2
    • hg19.fasta
    • hs37d5.fasta
    • hs38DH.fasta

bwa

├── hs37d5.fasta
├── hs37d5.fasta.amb
├── hs37d5.fasta.ann
├── hs37d5.fasta.bwt
├── hs37d5.fasta.pac
├── hs37d5.fasta.sa

PLAN BY MONDAY 23rd

giab_data_indexes

https://github.com/genome-in-a-bottle/giab_data_indexes

Test Data

  • 30x Exome
  • 150x Exome
  • 1x WGX at 30x min. (source better WGS data set as X10 is shit and messy)

GATK Gold Standard Run

  • run bwa-realing-bsqr-haplotypecaller on all 3 data sets

This is the "Gold Standard". This will a week if no bugs.

The Glue

Open :-

  1. BASH done better than before
  • logging
  • read a user supplied config file (spreadsheet like)
  • user specifies the pipeline
  • SJN TO ADD CONFIG PARAMETER LIST
  • consider converting to .yaml behind the scenes
  • self checks : does input exist move on

RECON BY MONDAY NEXT WEEK

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions