Skip to content

Commands

Fei Shen edited this page Nov 27, 2021 · 14 revisions


Running bsatos without any parameter

will provide the list of commands bsatos support.

Program: bsatos (Bulked segregant analysis tools for outbreeding species) Version: 1.0

Usage: bsatos <command> [options]

Command:

prepar      prepare the parents data
prep        prepare the pool data
haplotype   construct haplotype block
afd         calculate and filter allele frequency difference between two extreme pools
polish      polish candidate QTLs region and remove nosiy makers based on haplotype information
qtl_pick    judge and pick up QTLs from three types of peaks
igv         generate files for Integrative Genomics Viewer  
all         do prepar, prep, haplotype, afd, polish, qtl_pick and igv in turn
gs          sign genotype effect to each marker and conduct prediction


Usage: bastos prepar [options]

Options: –pf1 FILE paired-end1 fastq file from pollen parent [FASTQ format]

--pf2       FILE        paired-end2 fastq file from pollen parent  [FASTQ format]
--mf1       FILE        paired-end1 fastq file from maternal parent [FASTQ format]
--mf2       FILE        paired-end2 fastq file from maternal parent [FASTQ format]
--pb        FILE        pre-aligned bam file from pollen parent  [Not required, if --Pp1 & --Pp2]
--mb        FILE        pre-aligned bam file from maternal parent [Not required, if --Mp1 & --Mp2]  
--r         FILE        reference genome [FASTA format]
--o         FILE        output dir name prefix  [prepar] 
--gtf/gff   FILE        gene GTF/GFF file [GTF2/GFF3 format]

Aligment Options:

--t  INT        number of threads [1]

SNV Calling Options:

--aq  INT       skip alignments with mapQ smaller than INT [30]

SV Calling Options:

--vq INT        skip alignments with mapQ smaller than INT [30]

SNV filtering Options:

--d  INT        skip SNVs with reads depth smaller than INT [10]   
--sq INT        skip SNVs with phred-scaled quality smaller than [30]

Outputs:

prefix_dir  [DIR]

P_M_G1 [FILE]  the genotype of the markers are homozygous in maternal parent but are heterozygous in pollen parent       
P_M_G2 [FILE]  the genotype of the markers are homozygous in pollen parent but is heterozygous in maternal parent
P_M_G3 [FILE]  the genotype of the markers are both heterozygous 
M_P.snv [FILE] the SNVs vcf file of two parents  
sv.vcf [FILE]  the SVs vcf file of two  parents

prefix_dir/anno [DIR]

AT_refGene.txt [FILE]        GenePred file
AT_refGeneMrna.fa [FILE]     transcript FASTA file
snv.AT_multianno.txt [FILE]  SNVs multianno file
snv.AT_multianno.vcf [FILE]  SNVs annotated VCF file
snv.avinput [FILE]           SNVs input file
sv.AT_multianno.txt [FILE]   SVs multianno file 
sv.AT_multianno.vcf [FILE]   SVs annotated VCF file
sv.avinput [FILE]            SVs input file

Example:

1) bsatos prepar --pf1 P_1.fastq --pf2 P_2.fastq --mf1 M_1.fastq --mf2 M_fastq --r genome.fasta --gtf gene.gtf --o prepar
OR
2) bastos prepar --pb P.bam --mb M.bam --r genome.fasta --gtf gene.gtf --o prepar

Users could use clean reads (fastq files) as input

bsatos  prepar --pf1 <P fastq1> --pf2 <P fastq2> --mf1 <M fastq1> --mf2 <M fastq2> --r <reference> --gtf <genome gtf file>  --o <outputPrefix>

OR

Users could use read alignment (sorted BAM files) as input

BAM files could be prepared using Burrow-Wheeler Aligner (BWA),Bowtie2, Subread and so on.

BWA

bio-bwa.sourceforge.net/ github.com/lh3/bwa

Bowtie2 bowtie-bio.sourceforge.net/bowtie2/

bastos prepar --pb <P bam file> --mb <M bam file> --r <reference>  --o <outputPrefix> --gtf <genome gtf file>


Usage: bastos prep [options]

Options: –hf1 FILE paired-end1 fastq file from high extreme pool [FASTQ format]

--hf2  FILE     paired-end1 fastq file from high extreme pool  [FASTQ format]
--lf1  FILE     paired-end1 fastq file from low extreme pool   [FASTQ format]
--lf2  FILE     paired-end1 fastq file from low extreme pool   [FASTQ format]
--hb   FILE     pre-aligned bam file from high extreme pool    [Not required, if --Hp1 & --Hp2] 
--lb   FILE     pre-aligned bam file from low extreme pool     [Not required, if --Lp1 & --Lp2]
--r    FILE     reference genome [FASTA format]
--g1   FILE     G1 file from prepar step  
--g2   FILE     G2 file from prepar step          
--g3   FILE     G3 file from prepar step           
--o    FILE     output dir name prefix [prep]

Aligment Options:

--t2   INT        number of threads [1]

SNV Calling Options:

--aq2  INT        skip alignments with mapQ smaller than INT [30]

SNV filtering Options:

--cov  INT        average sequencing coverage [30] 
--sq2  INT        skip SNVs with phred-scaled quality smaller than [30]
--pn   INT        reads counts of minor allele should greater than INT [3]

Outputs:

prefix_dir [DIR] 

    g1.res   [FILE]   read counts with different alleles from H & L pools in G1 type loci 
    g2.res   [FILE]   read counts with different alleles from H & L pools in G2 type loci 
    g3.res   [FILE]   read counts with different alleles from H & L pools in G3 type loci

File format:

 Chromosome	Position     H_REF	H_ALT	L_REF	L_ALT		
  Chr01         11000       10           1      1        10
   .              .          .		  .      .        .
   .              .          .           .      .        .
   .              .          .           .      .        .

Chromosome: the chromosome of markers

Position : the positon in chromosome of markers H_REF : read counts with REF alleles in H pool H_ALT : read counts with ALT alleles in H pool L_REF : read counts with REF alleles in L pool L_ALT : read counts with ALT alleles in L pool

Example:

1)

bsatos prep --hf1 H_1.fastq --hf2 H_2.fastq --lf1 L_1.fastq --lf2 L_2.fastq --r genome.fasta  --g1 P_M_G1 --g2 P_M_G2 --g3 P_M_G3 --o prep

OR

2)

bastos prep --hb H.bam --lb L.bam --r genome.fasta --g1 P_M_G1 --g2 P_M_G2 --g3 P_M_G3 --o prep


Usage: bastos haplotype

Option: --pb  FILE     pre-aligned bam file from pollen parent 
        --mb  FILE     pre-aligned bam file from maternal parent
        --hb  FILE     pre-aligned bam file from high extreme pool
        --lb  FILE     pre-aligned bam file from low extreme pool
        --r   FILE     reference genome [FASTA format]
        --phase2  STR  use samtools algorithm or HAPCUT2 algorithm to assembly haplotype [default:T; T: samtools; F:HAPCUT2]   
        --o  STR       output dir name prefix

SNP genotyping Options:

--dep  INT   skip SNPs with read depth smaller than INT [10]
--aq3  INT   skip alignment with mapQ smaller than INT  [20]
--vq3  INT   skip SNPs with phred-scaled quality smaller than INT [40]

Outputs:

P.bam_block   [FILE]      haplotype blocks of pollen parent
M.bam_block   [FILE]      haplotype blocks of maternal parent 
H.bam_block   [FILE]      haplotype blocks of High extreme pool
L.bam_block   [FILE]      haplotype blocks of Low extreme pool
merged.block  [FILE]      merged, corrected and patched haplotype blocks of pollen parent, maternal parent, High extreme pool and Low extreme pool
phase_P.bed   [FILE]      BED format haplotype information of pollen parent
phase_M.bed   [FILE]      BED format haplotype information of maternal parent
phase_H.bed   [FILE]      BED format haplotype information of High extreme parent
phase_L.bed   [FILE]      BED format haplotype information of Low extreme parent
overlapped.bed [FILE]     BED format haplotype information classified from two parents and two pools

Example:

bsatos haplotype --pb P.bam --mb M.bam --hb H.bam --lb L.bam --r genome.fasta --o haplotype


Usage: bastos afd [options]

Options:

--g1 FILE        read counts with different alleles from H & L pools in G1 type loci from prep module [required]
--g2 FILE        read counts with different alleles from H & L pools in G2 type loci from prep module [required]
--g3 FILE        read counts with different alleles from H & L pools in G3 type loci from prep module [required]
--h  FILE        merged, corrented and patched haplotype block file of two parents and two pools from haplotype module [required]
--o  STR         output dir name prefix [afd]

Statistics Options:

--sd STR    the statistic method: ED/g/si [g]    
--w  INT    the sliding window size [1000000] 
--fn INT   batches for smoothing ;ther smaller the faster, but more memory [20]

Outputs:

*_g1.res_afd   [FILE]  G value (ED/SI) based G1 type loci and smoothed curve with different window in each chromosome 
*_g2.res_afd   [FILE]  G value (ED/SI) based G2 type loci and smoothed curve with different window in each chromosome
*_g3.res_afd   [FILE]  G value (ED/SI) based G3 type loci and smoothed curve with different window in each chromosome

g1.res.cal.out [FILE] G value (ED/SI) based G1 type loci and smoothed curve with different window across genome
g1.res.ad      [FILE] G value (ED/SI) based G1 type loci and smoothed curve with different window across genome with haplotype information 

g2.res.cal.out [FILE] G value (ED/SI) based G2 type loci and smoothed curve with different window across genome
g2.res.ad      [FILE] G value (ED/SI) based G2 type loci and smoothed curve with different window across genome with haplotype information 

g3.res.cal.out [FILE] G value (ED/SI) based G3 type loci and smoothed curve with different window across genome
g3.res.ad      [FILE] G value (ED/SI) based G3 type loci and smoothed curve with different window across genome with haplotype information

Example:

bsatos afd --sd g --g1 g1.res --g2 g2.res --g3 g3.res --h merged.block --o afd --w 1000000


bsatos polish

Usage: bastos polish

Options: 
         --o    STR       output dir name prefix [polish]
         --gs1  FILE      smoothed curve base on G1 type loci across genome with haplotype information from afd module 
         --gs2  FILE      smoothed curve base on G2 type loci across genome with haplotype information from afd module
         --gs3  FILE      smoothed curve base on G3 type loci across genome with haplotype information from afd module
         --h    FILE      merged haplotype block file from haplotype module 
         --fdr  INT       FDR threshold for the polishing [0.01]

Statistics Options:

--sd  STR   the statistic method: ED/g/si [g]    
--w   INT   the sliding window size [1000000] 
--fn  INT    batches for smoothing ;ther smaller the faster, but more memory [20]

Outputs:

g1.res.ad.polish [FILE]  read counts with different alleles in G1 type loci after polish 
g2.res.ad.polish [FILE]  read counts with different alleles in G2 type loci after polish
g3.res.ad.polish [FILE]  read counts with different alleles in G3 type loci after polish

*_g1.res.ad_afd  [FILE]  smoothed curve with different window in each chromosome based on polished read counts of G1 type loci
*_g2.res.ad_afd  [FILE]  smoothed curve with different window in each chromosome based on polished read counts of G2 type loci
*_g3.res.ad_afd  [FILE]  smoothed curve with different window in each chromosome based on polished read counts of G3 type loci

g1.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G1 type loci 
g2.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G2 type loci
g3.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G3 type loci

g1.res.ad.ad  [FILE] smoothed curve with haplotype information based on polished read counts of G1 type loci      
g2.res.ad.ad  [FILE] smoothed curve with haplotype information based on polished read counts of G2 type loci        
g3.res.ad.ad  [FILE] smoothed curve with haplotype information based on polished read counts of G3 type loci

Example:

bsatos polish –sd g –gs1 g1.res.ad –gs2 g2.res.ad –gs3 g3.res.ad –h merged.block –o polish



Usage: bastos qtl_pick [options]

Options: --o        STR        output dir name prefix
         --gp1      FILE       smoothed curve base on G1 type loci across genome with haplotype information from polish module 
         --gp2      FILE       smoothed curve base on G2 type loci across genome with haplotype information from polish module
         --gp3      FILE       smoothed curve base on G3 type loci across genome with haplotype information from polish module
         --v        FILE       annotated SNVs file from prepar step
         --sv       FILE       annotated SVs file from prepar step
         --gtf      FILE       gene.gtf file                             
         --h        FILE       haplotye file file from haplotype step 
         --q        INT        mininum phred-scaled quality score [30]         
         --pr       INT        promoter region [2000]
         --fdr      INT       FDR threshold for the polishing [0.01]

Outputs:

qtl     [FILE]         detected QTLs list file 
*.pdf   [FILE]         G value/ED/SI profiles across each chromosome (*:chromosome) 
g1_hap  [FILE]         haplotype information in G1 type loci 
g2_hap  [FILE]         haplotype information in G2 type loci
g2_hap  [FILE]         haplotype information in G3 type loci
gene.bed [FILE]        gene bed file 
*.gene   [FILE]        gene list located in the QTL regions (*: QTL accession)
*.hap  [FILE]          haplotype information located in the QTL regions (*:QTL accession)
*.snv [FILE]           screened SNVs based on genetic rules located in the QTL regions (*: QTL accession)  
*.sv [FIE]             screened SNVs based on genetic rules located in the QTL regions (*: QTL accession)

Example:

bsatos qtl_pick --o qtl_pick --gp1 g1.res.ad.ad --gp2 g2.res.ad.ad --gp3 g3.res.ad.ad -v snv.AT_multianno.txt --sv sv.AT_multianno.txt --gtf gene.gtf --h merged.block


Usage: bsatos all [options]

Options: 
       --oprefix STR       the prefix of the result folder [all] 
       --r       FILE      the genome fasta file [fasta]
       --gtf     FILE      the GTF/GFF file of the genes
       --pf1     FILE      the paired-end1 fastq file of the pollen parent    | |    | --pb FILE  BAM file of the pollen parent
       --pf2     FILE      the paired-end2 fastq file of the pollen parent    | |    |
       --mf1     FILE      the paired-end1 fastq file of the maternal parent  | |    | --mb FILE  BAM file of the maternal parent 
       --mf2     FILE      the paired-end2 fastq file of the maternal parent  | | OR |     
       --hf1     FILE      the paired-end1 fastq file of the H pool reads     | |    | --hb FILE  BAM file of the H pool 
       --hf2     FILE      the paired-end2 fastq file of the H pool reads     | |    |
       --lf1     FILE      the paired-end1 fastq file of the L pool reads     | |    | --lb FILE  BAM file of the L pool
       --lf2:    FILE      the paired-end2 fastq file of the L pool reads     | |    |
       --log1    FILE      prepar module log file [prepar.log] 
       --log2    FILE      prep module log file [prep.log]
       --log3    FILE      haplotype log file [haplotype.log]
       --log4    FILE      afd module log [afd.log]
       --log5    FILE      polish module log file [polish.log]
       --log6    FILE      qtl_pick module log file [qtl_pick.log]
       --log     FILE      script module log [script.log]

1) prepar step options:

Aligment Options:

--t  INT        number of threads [1]

SNV Calling Options:

--aq  INT       skip alignments with mapQ smaller than INT [30]

SV Calling Options:

--vq INT        skip alignments with mapQ smaller than INT [30]

SNV filtering Options:

--d  INT        skip SNVs with reads depth smaller than INT [10]   
--sq INT        skip SNVs with phred-scaled quality smaller than [30]

2) prep step options:

Aligment Options:

--t2   INT        number of threads [1]

SNV Calling Options:

--aq2  INT        skip alignments with mapQ smaller than INT [30]

SNV filtering Options:

--cov  INT        average sequencing coverage [30] 
--sq2  INT        skip SNVs with phred-scaled quality smaller than [30]
--pn   INT        reads counts of minor allele should greater than INT [3]

3) haplotype step options:

          --phase2   use samtools algorithm or HAPCUT2 algorithm to assembly haplotype [default:T; T: samtools; F:HAPCUT2]

SNP genotyping Options:

          --dep  INT   skip SNPs with read depth smaller than INT [10]
          --aq3  INT   skip alignment with mapQ smaller than INT  [20]
          --vq3  INT   skip SNPs with phred-scaled quality smaller than INT [40]

4) afd step options:

Statistics Options:

--sd  STR    the statistic method: ED/g/si [g]    
--w   INT    the sliding window size [1000000] 
--fn  INT    batches for smoothing; the smaller the faster but more memory[20]

5) polish step options:

Statistics Options:

--sd  STR    the statistic method: ED/g/si [g]    
--w   INT    the sliding window size [1000000] 
--fn  INT    batches for smoothing; the smaller the faster but more memory[20]

6) qtl_pick options:

--q   INT        mininum phred-scaled quality score [30]         
--pr  INT        promoter region [2000]

Example:

1) Use reads files to run bastos

bastos all --o result --r genome.fasta --gtf gene.gtf --pf1 P_1.fastq.gz --pf2 P_2.fastq.gz --mf1 M_1.fastq.gz --mf2 M_2.fastq.gz --hf1 H_1.fastq.gz --hf2 H_2.fastq.gz --lf1 L_1.fastq.gz --lf2 L_2.fastq.gz

2) Use pre-aligned BAMs files to run bastos

bastos all --o result --r genome.fasta --gtf gene.gtf --pb P.bam --mb M.bam --hb H.bam --lb L.bam


Usage: bastos gs [options]

Options:

--gen  FILE     Genotype file
--phe  FILE     Phenotype file
--rou  INT      The rounds of calculate [1000]
--pre FLOAT    The trainning set in all the populations [0.6]
--o    STR      outputPrefix [gs]

Outputs:

gs_dir [DR]