From d2776bc27af7bcf093114a411ec7930422084ace Mon Sep 17 00:00:00 2001 From: Christian Fung Date: Wed, 23 Oct 2019 11:28:40 -0400 Subject: [PATCH 1/3] created the VarTable options list --- OPTIONS.md | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) create mode 100644 OPTIONS.md diff --git a/OPTIONS.md b/OPTIONS.md new file mode 100644 index 0000000..55e9002 --- /dev/null +++ b/OPTIONS.md @@ -0,0 +1,46 @@ +# VarTable Options + +## (Required) +**--input-vcf** input VCF file + +* Do we need to differentiate between NGS_Mapper VCF (every position) versus lofreq VCF (only variants)?* + +**--input-bam** input BAM file + +* Do we assume BAM index file is in the same directory as BAM or should we include it as an option? + +*--input-bai input BAM index file* + +**--input-ref** input reference FASTA file + +* Assuming we are using a GenBank ref, are we planning to parse annotated data?* + +*--accession input reference fASTA accession number to parse GenBank information* + +**--minpercent** input variant call percentage (at what frequency do we call a nt a variant) + +* Required b/c we cannot assume the level 20%, 5%, 1%... it also prevents a mistaken run by user* + +**--minbq** input minimum base nt quality + +* Required b/c we cannot assume the quality 25%, 30%... it depends on --minpercent* + + +## (Optional) +**--input-prm** input primer FASTA file + +**--mindepth** input minimum depth of coverage for each nt base + +* Should we set this automatically to a default of 10? For our usage, 10 does not change* + +**--output-name** input the desired output filename (default format: tsv) + +* Optional b/c I am assuming a generic output name can be given if none is given* + +**--output-dir** input the desired output directory + +* Optional b/c I am assuming VarTable is able to output into current working directory if none is given* + +*--output-err input name of output error file for user troubleshooting or submit the error file for help* + +* Instead of the user gathering all pertinenet run information, error file should contain it for quick help* From 3444b752abd01018a77bcf677eef2cb8e654b7f5 Mon Sep 17 00:00:00 2001 From: Christian Fung Date: Wed, 23 Oct 2019 14:29:12 -0400 Subject: [PATCH 2/3] revision --- OPTIONS.md | 46 +++++++++++++++++----------------------------- 1 file changed, 17 insertions(+), 29 deletions(-) diff --git a/OPTIONS.md b/OPTIONS.md index 55e9002..907e1e7 100644 --- a/OPTIONS.md +++ b/OPTIONS.md @@ -1,46 +1,34 @@ -# VarTable Options +## VarTable Options -## (Required) -**--input-vcf** input VCF file -* Do we need to differentiate between NGS_Mapper VCF (every position) versus lofreq VCF (only variants)?* - -**--input-bam** input BAM file -* Do we assume BAM index file is in the same directory as BAM or should we include it as an option? +Usage +--------- -*--input-bai input BAM index file* + $ vartable_report --input-vcf [../filename.vcf] --type [basecaller] \ + --input-bam [../filename.bam] --input-ref [../filename.fasta] \ + --minpercent [20] --minbq [25] --mindepth [10] \ + --output-err [filename.err] --input-prm [../filename.fasta] \ + --output-name [filename.tsv] -**--input-ref** input reference FASTA file +## (Required) +**--input-vcf** input VCF file -* Assuming we are using a GenBank ref, are we planning to parse annotated data?* +**--type** input flag (base_caller | lofreq), this differentiates b/t a VCF with all positions and VCF with only variant positions -*--accession input reference fASTA accession number to parse GenBank information* +**--input-bam** input BAM file -**--minpercent** input variant call percentage (at what frequency do we call a nt a variant) +**--input-ref** input reference FASTA file or GenBank, gb extension file (this will parse for annotated information) -* Required b/c we cannot assume the level 20%, 5%, 1%... it also prevents a mistaken run by user* +**--minpercent** input variant call percentage, for example 20%, 5% or 1% -**--minbq** input minimum base nt quality +**--minbq** input minimum base nt quality, for example 25% or 30% -* Required b/c we cannot assume the quality 25%, 30%... it depends on --minpercent* +**--mindepth** input minimum depth of coverage for each nt base, for example 10 +**--output-err** input name of output file for user troubleshooting or submit the error file for help ## (Optional) **--input-prm** input primer FASTA file -**--mindepth** input minimum depth of coverage for each nt base - -* Should we set this automatically to a default of 10? For our usage, 10 does not change* - **--output-name** input the desired output filename (default format: tsv) - -* Optional b/c I am assuming a generic output name can be given if none is given* - -**--output-dir** input the desired output directory - -* Optional b/c I am assuming VarTable is able to output into current working directory if none is given* - -*--output-err input name of output error file for user troubleshooting or submit the error file for help* - -* Instead of the user gathering all pertinenet run information, error file should contain it for quick help* From 87d13002bf40a1c25823512277e919aeeb6f0f9c Mon Sep 17 00:00:00 2001 From: Christian Fung Date: Fri, 25 Oct 2019 14:21:04 -0400 Subject: [PATCH 3/3] added 2 more possible options --- OPTIONS.md | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/OPTIONS.md b/OPTIONS.md index 907e1e7..a36e8d4 100644 --- a/OPTIONS.md +++ b/OPTIONS.md @@ -9,7 +9,8 @@ Usage --input-bam [../filename.bam] --input-ref [../filename.fasta] \ --minpercent [20] --minbq [25] --mindepth [10] \ --output-err [filename.err] --input-prm [../filename.fasta] \ - --output-name [filename.tsv] + --output-name [filename.tsv] --input-region [10 200] \ + --stats ## (Required) **--input-vcf** input VCF file @@ -32,3 +33,7 @@ Usage **--input-prm** input primer FASTA file **--output-name** input the desired output filename (default format: tsv) + +**--input-region** input start and end positions; variant table will isolate variants within that region + +**--stats** outputs a file containing genome statistics