-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Summary:
When I was attempting to prepare inputs for Tractor, I accidentally selected the wrong file path to an empty vcf. Much to my surprise, Tractor ran fine (See the output below) and even created output files without any warning. This behavior seems like it could be confusing to a user and it may be clearer to have the extract_tracts.py script explicity give a failure. This is a very minor code change that I think could improve the user experience so I am more than happy to attempt it and open a pull request (if it something that's wanted)
INFO (__main__ 97): # Running script version : 1.2.0
INFO (__main__ 98): # VCF File : *output_filepath*
INFO (__main__ 99): # Prefix of output file names : test_input
INFO (__main__ 100): # VCF File is compressed? : False
INFO (__main__ 101): # Number of Ancestries in VCF : 6
INFO (__main__ 102): # Output Directory : test/
INFO (__main__ 110): Creating output files for 6 ancestries
INFO (__main__ 125): Iterating through VCF file
INFO (__main__ 233): Finished extracting tracts per 6 ancestries
Explanation of what happens
This issue occurs at line 126 of the extract_tracts.py script. The code at this point has created a TextIOWrapper with the vcf filehandle and when it tries to read the vcf file line by line, it finds that it is empty. The code just instantly skips from line 126 to line 233 and reports that the program "Finished extracting tracts per %d ancestries" and while this is true it is not clear to a user that no tracts were extracted
Possible Solutions
The easiest solution would probably be to "os.path.getsize(vcf_path)" around line 92 in the extract_tracts function. Right before this line there are several checks about whether the file exists so once we know the file exists then we can check the size and if the file is empty then the script throw a ValueError (Or something similar) and log the incident and exit.
How to Reproduce
I'm running this on Ubuntu 24 LTS server with python 3.11. For inputs you can just make an empty vcf file and then use any existing msp file