Hello,
I've been testing running RAiSD on subregions of my VCF dataset, as the combination of sample size and genome size makes each chromsome VCF quite large. I've run into a strange error message for some of these subregions. An example, including the command I'm running:
Command line : RAiSD-AI -n Chr24_25000001_50000000_large -I vcfs/24.25000001.50000000.vcf -S ./poplists/large.txt -y 2 -M 0 -w 50 -f -R -D -A 0.01 -op RSD-DEF
Operation mode : mu-statistic scan
Window width : 50
Sample size : 404 [Total: 1193, Not found: 0, Requested 404]
Dataset format : vcf
var-exp : 1.0
sfs-exp : 1.0
ld-exp : 1.0
Rscript version :
A pattern structure of 131072 patterns (max. capacity) and approx. 16 MB memory footprint has been created.
The pattern structure has been resized to 74898 patterns (max. capacity) and approx. 16 MB memory footprint.
ERROR: A VCF entry is found at position 49914576, whereas the region size is set to 49914531 via -B.
(-B is not required with VCF files)
What's confusing me is that the error is regarding the -B flag, which I don't set myself but I imagine is set internally after parsing the VCF. Here's the example output from a run that did finish successfully, using the same parameters, same sample set and a different region of the same physical size (25Mb):
Command line : RAiSD-AI -n Chr6_50000001_75000000_large -I vcfs/6.50000001.75000000.vcf -S /private/groups/shapirolab/brock/cows/poplists/large.txt -y 2 -M 0 -w 50 -f -R -D -A 0.01 -o>
Operation mode : mu-statistic scan
Window width : 50
Sample size : 404 [Total: 1193, Not found: 0, Requested 404]
Dataset format : vcf
var-exp : 1.0
sfs-exp : 1.0
ld-exp : 1.0
Rscript version :
A pattern structure of 131072 patterns (max. capacity) and approx. 16 MB memory footprint has been created.
The pattern structure has been resized to 74898 patterns (max. capacity) and approx. 16 MB memory footprint.
0: Set 6 | Sites 909899 | SNPs 94777 | Region 74999978 - muVar 51217712 1.343e+00 | muSFS 53015643 6.835e+00 | muLD 57112756 5.482e+00 | mu 59933220 1.048e+01
Sets (total) : 1
Sets (processed) : 1
Sets (not processed) : 0
Total execution time 2285.44852 seconds
Total memory footprint 92733 kbytes
All vcfs are composed entirely of biallelic SNPs with MAF greater than 0. Also, here's a snapshot of the variants surrounding the region stop site for the first example (info & genotypes ommitted for space). Nothing looks unusual, and in the full VCF there are thousands of variants following it:
Any thoughts as to what might be driving this? I'm going to try to keep narrowing down the problem VCF and see if I can identify what's causing the error. Thanks in advance for your help!
Best,
Brock
Hello,
I've been testing running RAiSD on subregions of my VCF dataset, as the combination of sample size and genome size makes each chromsome VCF quite large. I've run into a strange error message for some of these subregions. An example, including the command I'm running:
What's confusing me is that the error is regarding the
-Bflag, which I don't set myself but I imagine is set internally after parsing the VCF. Here's the example output from a run that did finish successfully, using the same parameters, same sample set and a different region of the same physical size (25Mb):All vcfs are composed entirely of biallelic SNPs with MAF greater than 0. Also, here's a snapshot of the variants surrounding the region stop site for the first example (info & genotypes ommitted for space). Nothing looks unusual, and in the full VCF there are thousands of variants following it:
Any thoughts as to what might be driving this? I'm going to try to keep narrowing down the problem VCF and see if I can identify what's causing the error. Thanks in advance for your help!
Best,
Brock