Python scripts to make dosage files without ambiguous SNPs and INDELs directly from vcf.gz. Both scripts essentially work the same, in which the outputs will be a sample .txt file (which is required to run PrediXcan) and a dosage txt.gz file per chromosome, including non-autosomes (if provided). The only difference is that TOPMed_vcf2dosage_a.py takes a .vcf.gz containg only a single chromosome as input, whereas TOPMed_vcf2dosage_b.py takes as input a vcf.gz containing multiple chromosomes.
The scripts can also be customized if needed. As provided, they will rename all SNPs to the chr#:pos:ref:alt format and update chrX, chrXY, chrY, and chrM to their numeric versions.
Imported libraries:
- argparse
- gzip
- os
- sys
NOTE: this is a modified version of the script found in here.