-
Notifications
You must be signed in to change notification settings - Fork 3
Commands
Running bsatos without any parameter
will provide the list of commands bsatos support.
Program: bsatos (Bulked segregant analysis tools for outbreeding species) Version: 1.0
Usage: bsatos <command> [options]
Command:
prepar prepare the parents data prep prepare the pool data haplotype construct haplotype block afd calculate and filter allele frequency difference between two extreme pools polish polish candidate QTLs region and remove nosiy makers based on haplotype information qtl_pick judge and pick up QTLs from three types of peaks igv generate files for Integrative Genomics Viewer all do prepar, prep, haplotype, afd, polish, qtl_pick and igv in turn gs sign genotype effect to each marker and conduct prediction
prepare the parent data including SNVs and SVs calling, annotation, filtering and separating based on different segragation patterns.
Usage: bastos prepar [options]
Options: –pf1 FILE paired-end1 fastq file from pollen parent [FASTQ format]
--pf2 FILE paired-end2 fastq file from pollen parent [FASTQ format] --mf1 FILE paired-end1 fastq file from maternal parent [FASTQ format] --mf2 FILE paired-end2 fastq file from maternal parent [FASTQ format] --pb FILE pre-aligned bam file from pollen parent [Not required, if --Pp1 & --Pp2] --mb FILE pre-aligned bam file from maternal parent [Not required, if --Mp1 & --Mp2] --r FILE reference genome [FASTA format] --o FILE output dir name prefix [prepar] --gtf/gff FILE gene GTF/GFF file [GTF2/GFF3 format]
Aligment Options:
--t INT number of threads [1]
SNV Calling Options:
--aq INT skip alignments with mapQ smaller than INT [30]
SV Calling Options:
--vq INT skip alignments with mapQ smaller than INT [30]
SNV filtering Options:
--d INT skip SNVs with reads depth smaller than INT [10] --sq INT skip SNVs with phred-scaled quality smaller than [30]
Outputs:
prefix_dir [DIR] P_M_G1 [FILE] the genotype of the markers are homozygous in maternal parent but are heterozygous in pollen parent P_M_G2 [FILE] the genotype of the markers are homozygous in pollen parent but is heterozygous in maternal parent P_M_G3 [FILE] the genotype of the markers are both heterozygous M_P.snv [FILE] the SNVs vcf file of two parents sv.vcf [FILE] the SVs vcf file of two parents prefix_dir/anno [DIR] AT_refGene.txt [FILE] GenePred file AT_refGeneMrna.fa [FILE] transcript FASTA file snv.AT_multianno.txt [FILE] SNVs multianno file snv.AT_multianno.vcf [FILE] SNVs annotated VCF file snv.avinput [FILE] SNVs input file sv.AT_multianno.txt [FILE] SVs multianno file sv.AT_multianno.vcf [FILE] SVs annotated VCF file sv.avinput [FILE] SVs input file
Example:
1) bsatos prepar --pf1 P_1.fastq --pf2 P_2.fastq --mf1 M_1.fastq --mf2 M_fastq --r genome.fasta --gtf gene.gtf --o prepar OR 2) bastos prepar --pb P.bam --mb M.bam --r genome.fasta --gtf gene.gtf --o prepar
Users could use clean reads (fastq files) as input
bsatos prepar --pf1 <P fastq1> --pf2 <P fastq2> --mf1 <M fastq1> --mf2 <M fastq2> --r <reference> --gtf <genome gtf file> --o <outputPrefix>
OR
Users could use read alignment (sorted BAM files) as input
BAM files could be prepared using Burrow-Wheeler Aligner (BWA),Bowtie2, Subread and so on.
BWA
bio-bwa.sourceforge.net/ github.com/lh3/bwa
Bowtie2 bowtie-bio.sourceforge.net/bowtie2/
bastos prepar --pb <P bam file> --mb <M bam file> --r <reference> --o <outputPrefix> --gtf <genome gtf file>
extracting reads counts data of two extreme pools based on the three segregation pattern respectively.
Usage: bastos prep [options]
Options: –hf1 FILE paired-end1 fastq file from high extreme pool [FASTQ format]
--hf2 FILE paired-end1 fastq file from high extreme pool [FASTQ format] --lf1 FILE paired-end1 fastq file from low extreme pool [FASTQ format] --lf2 FILE paired-end1 fastq file from low extreme pool [FASTQ format] --hb FILE pre-aligned bam file from high extreme pool [Not required, if --Hp1 & --Hp2] --lb FILE pre-aligned bam file from low extreme pool [Not required, if --Lp1 & --Lp2] --r FILE reference genome [FASTA format] --g1 FILE G1 file from prepar step --g2 FILE G2 file from prepar step --g3 FILE G3 file from prepar step --o FILE output dir name prefix [prep]
Aligment Options:
--t2 INT number of threads [1]
SNV Calling Options:
--aq2 INT skip alignments with mapQ smaller than INT [30]
SNV filtering Options:
--cov INT average sequencing coverage [30] --sq2 INT skip SNVs with phred-scaled quality smaller than [30] --pn INT reads counts of minor allele should greater than INT [3]
Outputs:
prefix_dir [DIR]
g1.res [FILE] read counts with different alleles from H & L pools in G1 type loci
g2.res [FILE] read counts with different alleles from H & L pools in G2 type loci
g3.res [FILE] read counts with different alleles from H & L pools in G3 type loci
File format:
Chromosome Position H_REF H_ALT L_REF L_ALT
Chr01 11000 10 1 1 10
. . . . . .
. . . . . .
. . . . . .
Chromosome: the chromosome of markers
Position : the positon in chromosome of markers H_REF : read counts with REF alleles in H pool H_ALT : read counts with ALT alleles in H pool L_REF : read counts with REF alleles in L pool L_ALT : read counts with ALT alleles in L pool
Example:
1)
bsatos prep --hf1 H_1.fastq --hf2 H_2.fastq --lf1 L_1.fastq --lf2 L_2.fastq --r genome.fasta --g1 P_M_G1 --g2 P_M_G2 --g3 P_M_G3 --o prep
OR
2)
bastos prep --hb H.bam --lb L.bam --r genome.fasta --g1 P_M_G1 --g2 P_M_G2 --g3 P_M_G3 --o prep
Usage: bastos haplotype
Option: --pb FILE pre-aligned bam file from pollen parent
--mb FILE pre-aligned bam file from maternal parent
--hb FILE pre-aligned bam file from high extreme pool
--lb FILE pre-aligned bam file from low extreme pool
--r FILE reference genome [FASTA format]
--phase2 STR use samtools algorithm or HAPCUT2 algorithm to assembly haplotype [default:T; T: samtools; F:HAPCUT2]
--o STR output dir name prefix
SNP genotyping Options:
--dep INT skip SNPs with read depth smaller than INT [10] --aq3 INT skip alignment with mapQ smaller than INT [20] --vq3 INT skip SNPs with phred-scaled quality smaller than INT [40]
Outputs:
P.bam_block [FILE] haplotype blocks of pollen parent M.bam_block [FILE] haplotype blocks of maternal parent H.bam_block [FILE] haplotype blocks of High extreme pool L.bam_block [FILE] haplotype blocks of Low extreme pool merged.block [FILE] merged, corrected and patched haplotype blocks of pollen parent, maternal parent, High extreme pool and Low extreme pool phase_P.bed [FILE] BED format haplotype information of pollen parent phase_M.bed [FILE] BED format haplotype information of maternal parent phase_H.bed [FILE] BED format haplotype information of High extreme parent phase_L.bed [FILE] BED format haplotype information of Low extreme parent overlapped.bed [FILE] BED format haplotype information classified from two parents and two pools
Example:
bsatos haplotype --pb P.bam --mb M.bam --hb H.bam --lb L.bam --r genome.fasta --o haplotype
calculate and filter allele frequency difference between two extreme pools [ED/g value/SNP index] and smooth the afd/ed/si/g value
Usage: bastos afd [options]
Options:
--g1 FILE read counts with different alleles from H & L pools in G1 type loci from prep module [required] --g2 FILE read counts with different alleles from H & L pools in G2 type loci from prep module [required] --g3 FILE read counts with different alleles from H & L pools in G3 type loci from prep module [required] --h FILE merged, corrented and patched haplotype block file of two parents and two pools from haplotype module [required] --o STR output dir name prefix [afd]
Statistics Options:
--sd STR the statistic method: ED/g/si [g] --w INT the sliding window size [1000000] --fn INT batches for smoothing ;ther smaller the faster, but more memory [20]
Outputs:
*_g1.res_afd [FILE] G value (ED/SI) based G1 type loci and smoothed curve with different window in each chromosome *_g2.res_afd [FILE] G value (ED/SI) based G2 type loci and smoothed curve with different window in each chromosome *_g3.res_afd [FILE] G value (ED/SI) based G3 type loci and smoothed curve with different window in each chromosome g1.res.cal.out [FILE] G value (ED/SI) based G1 type loci and smoothed curve with different window across genome g1.res.ad [FILE] G value (ED/SI) based G1 type loci and smoothed curve with different window across genome with haplotype information g2.res.cal.out [FILE] G value (ED/SI) based G2 type loci and smoothed curve with different window across genome g2.res.ad [FILE] G value (ED/SI) based G2 type loci and smoothed curve with different window across genome with haplotype information g3.res.cal.out [FILE] G value (ED/SI) based G3 type loci and smoothed curve with different window across genome g3.res.ad [FILE] G value (ED/SI) based G3 type loci and smoothed curve with different window across genome with haplotype information
Example:
bsatos afd --sd g --g1 g1.res --g2 g2.res --g3 g3.res --h merged.block --o afd --w 1000000
bsatos polish
Usage: bastos polish
Options:
--o STR output dir name prefix [polish]
--gs1 FILE smoothed curve base on G1 type loci across genome with haplotype information from afd module
--gs2 FILE smoothed curve base on G2 type loci across genome with haplotype information from afd module
--gs3 FILE smoothed curve base on G3 type loci across genome with haplotype information from afd module
--h FILE merged haplotype block file from haplotype module
--fdr INT FDR threshold for the polishing [0.01]
Statistics Options:
--sd STR the statistic method: ED/g/si [g] --w INT the sliding window size [1000000] --fn INT batches for smoothing ;ther smaller the faster, but more memory [20]
Outputs:
g1.res.ad.polish [FILE] read counts with different alleles in G1 type loci after polish g2.res.ad.polish [FILE] read counts with different alleles in G2 type loci after polish g3.res.ad.polish [FILE] read counts with different alleles in G3 type loci after polish *_g1.res.ad_afd [FILE] smoothed curve with different window in each chromosome based on polished read counts of G1 type loci *_g2.res.ad_afd [FILE] smoothed curve with different window in each chromosome based on polished read counts of G2 type loci *_g3.res.ad_afd [FILE] smoothed curve with different window in each chromosome based on polished read counts of G3 type loci g1.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G1 type loci g2.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G2 type loci g3.res.ad.cal.out [FILE] smoothed curve across genome based on polished read counts of G3 type loci g1.res.ad.ad [FILE] smoothed curve with haplotype information based on polished read counts of G1 type loci g2.res.ad.ad [FILE] smoothed curve with haplotype information based on polished read counts of G2 type loci g3.res.ad.ad [FILE] smoothed curve with haplotype information based on polished read counts of G3 type loci
Example:
bsatos polish –sd g –gs1 g1.res.ad –gs2 g2.res.ad –gs3 g3.res.ad –h merged.block –o polish
Usage: bastos qtl_pick [options]
Options: --o STR output dir name prefix
--gp1 FILE smoothed curve base on G1 type loci across genome with haplotype information from polish module
--gp2 FILE smoothed curve base on G2 type loci across genome with haplotype information from polish module
--gp3 FILE smoothed curve base on G3 type loci across genome with haplotype information from polish module
--v FILE annotated SNVs file from prepar step
--sv FILE annotated SVs file from prepar step
--gtf FILE gene.gtf file
--h FILE haplotye file file from haplotype step
--q INT mininum phred-scaled quality score [30]
--pr INT promoter region [2000]
--fdr INT FDR threshold for the polishing [0.01]
Outputs:
qtl [FILE] detected QTLs list file *.pdf [FILE] G value/ED/SI profiles across each chromosome (*:chromosome) g1_hap [FILE] haplotype information in G1 type loci g2_hap [FILE] haplotype information in G2 type loci g2_hap [FILE] haplotype information in G3 type loci gene.bed [FILE] gene bed file *.gene [FILE] gene list located in the QTL regions (*: QTL accession) *.hap [FILE] haplotype information located in the QTL regions (*:QTL accession) *.snv [FILE] screened SNVs based on genetic rules located in the QTL regions (*: QTL accession) *.sv [FIE] screened SNVs based on genetic rules located in the QTL regions (*: QTL accession)
Example:
bsatos qtl_pick --o qtl_pick --gp1 g1.res.ad.ad --gp2 g2.res.ad.ad --gp3 g3.res.ad.ad -v snv.AT_multianno.txt --sv sv.AT_multianno.txt --gtf gene.gtf --h merged.block
Usage: bsatos all [options]
Options:
--oprefix STR the prefix of the result folder [all]
--r FILE the genome fasta file [fasta]
--gtf FILE the GTF/GFF file of the genes
--pf1 FILE the paired-end1 fastq file of the pollen parent | | | --pb FILE BAM file of the pollen parent
--pf2 FILE the paired-end2 fastq file of the pollen parent | | |
--mf1 FILE the paired-end1 fastq file of the maternal parent | | | --mb FILE BAM file of the maternal parent
--mf2 FILE the paired-end2 fastq file of the maternal parent | | OR |
--hf1 FILE the paired-end1 fastq file of the H pool reads | | | --hb FILE BAM file of the H pool
--hf2 FILE the paired-end2 fastq file of the H pool reads | | |
--lf1 FILE the paired-end1 fastq file of the L pool reads | | | --lb FILE BAM file of the L pool
--lf2: FILE the paired-end2 fastq file of the L pool reads | | |
--log1 FILE prepar module log file [prepar.log]
--log2 FILE prep module log file [prep.log]
--log3 FILE haplotype log file [haplotype.log]
--log4 FILE afd module log [afd.log]
--log5 FILE polish module log file [polish.log]
--log6 FILE qtl_pick module log file [qtl_pick.log]
--log FILE script module log [script.log]
1) prepar step options:
Aligment Options:
--t INT number of threads [1]
SNV Calling Options:
--aq INT skip alignments with mapQ smaller than INT [30]
SV Calling Options:
--vq INT skip alignments with mapQ smaller than INT [30]
SNV filtering Options:
--d INT skip SNVs with reads depth smaller than INT [10] --sq INT skip SNVs with phred-scaled quality smaller than [30]
2) prep step options:
Aligment Options:
--t2 INT number of threads [1]
SNV Calling Options:
--aq2 INT skip alignments with mapQ smaller than INT [30]
SNV filtering Options:
--cov INT average sequencing coverage [30] --sq2 INT skip SNVs with phred-scaled quality smaller than [30] --pn INT reads counts of minor allele should greater than INT [3]
3) haplotype step options:
--phase2 use samtools algorithm or HAPCUT2 algorithm to assembly haplotype [default:T; T: samtools; F:HAPCUT2]
SNP genotyping Options:
--dep INT skip SNPs with read depth smaller than INT [10]
--aq3 INT skip alignment with mapQ smaller than INT [20]
--vq3 INT skip SNPs with phred-scaled quality smaller than INT [40]
4) afd step options:
Statistics Options:
--sd STR the statistic method: ED/g/si [g] --w INT the sliding window size [1000000] --fn INT batches for smoothing; the smaller the faster but more memory[20]
5) polish step options:
Statistics Options:
--sd STR the statistic method: ED/g/si [g] --w INT the sliding window size [1000000] --fn INT batches for smoothing; the smaller the faster but more memory[20]
6) qtl_pick options:
--q INT mininum phred-scaled quality score [30] --pr INT promoter region [2000]
Example:
1) Use reads files to run bastos
bastos all --o result --r genome.fasta --gtf gene.gtf --pf1 P_1.fastq.gz --pf2 P_2.fastq.gz --mf1 M_1.fastq.gz --mf2 M_2.fastq.gz --hf1 H_1.fastq.gz --hf2 H_2.fastq.gz --lf1 L_1.fastq.gz --lf2 L_2.fastq.gz
2) Use pre-aligned BAMs files to run bastos
bastos all --o result --r genome.fasta --gtf gene.gtf --pb P.bam --mb M.bam --hb H.bam --lb L.bam
Usage: bastos gs [options]
Options:
--gen FILE Genotype file --phe FILE Phenotype file --rou INT The rounds of calculate [1000] --pre FLOAT The trainning set in all the populations [0.6] --o STR outputPrefix [gs]
Outputs:
gs_dir [DR]