Skip to content

Issue: There is no warning or failure when the input VCF to extract_tracts.py is empty #52

@jtb324

Description

@jtb324

Summary:

When I was attempting to prepare inputs for Tractor, I accidentally selected the wrong file path to an empty vcf. Much to my surprise, Tractor ran fine (See the output below) and even created output files without any warning. This behavior seems like it could be confusing to a user and it may be clearer to have the extract_tracts.py script explicity give a failure. This is a very minor code change that I think could improve the user experience so I am more than happy to attempt it and open a pull request (if it something that's wanted)

INFO (__main__ 97): # Running script version      : 1.2.0
INFO (__main__ 98): # VCF File                    : *output_filepath*
INFO (__main__ 99): # Prefix of output file names : test_input
INFO (__main__ 100): # VCF File is compressed?     : False
INFO (__main__ 101): # Number of Ancestries in VCF : 6
INFO (__main__ 102): # Output Directory            : test/
INFO (__main__ 110): Creating output files for 6 ancestries
INFO (__main__ 125): Iterating through VCF file
INFO (__main__ 233): Finished extracting tracts per 6 ancestries

Explanation of what happens

This issue occurs at line 126 of the extract_tracts.py script. The code at this point has created a TextIOWrapper with the vcf filehandle and when it tries to read the vcf file line by line, it finds that it is empty. The code just instantly skips from line 126 to line 233 and reports that the program "Finished extracting tracts per %d ancestries" and while this is true it is not clear to a user that no tracts were extracted

Possible Solutions

The easiest solution would probably be to "os.path.getsize(vcf_path)" around line 92 in the extract_tracts function. Right before this line there are several checks about whether the file exists so once we know the file exists then we can check the size and if the file is empty then the script throw a ValueError (Or something similar) and log the incident and exit.

How to Reproduce

I'm running this on Ubuntu 24 LTS server with python 3.11. For inputs you can just make an empty vcf file and then use any existing msp file

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions