Skip to content

Fix heistMerge execution and more...#16

Open
lhugolach wants to merge 6 commits intomhibbins:masterfrom
lhugolach:master
Open

Fix heistMerge execution and more...#16
lhugolach wants to merge 6 commits intomhibbins:masterfrom
lhugolach:master

Conversation

@lhugolach
Copy link
Copy Markdown
Contributor

@lhugolach lhugolach commented Apr 22, 2024

Fix heistMerge execution

Hello,
at the moment, if the heistMerge command is used, execution stops with the following error:

> $ heistMerge outputs/ -d
Traceback (most recent call last):
   File "usr/local/bin/heistMerge", line 33 in <module>
      sys.exit(load_entry_point('heist-hemiplasy==0.3.1', 'console_scripts', 'heistMerge')())
   File "usr/local/lib/python3.10/dist-packages/heist_hemiplasy-0.3.1-py3.10.egg/heist/__main__.py", line 143, in heistMerge
KeyError '#9'

The error is triggered in elseif where it retrieves the float value contained in the key #9

elif i == 8:
    data["#9"] += float(line)

but if you notice in the dictionary date this key is not present

data = {"#1": 0, "#2": 0, "#3": 0, "#4": 0, "#5": 0,
        "#6": 0, "#7": 0, "#8": 0}

Solution I propose...

My correction adds this key by setting it to 0.0 as a float value

data = {"#1": 0, "#2": 0, "#3": 0, "#4": 0, "#5": 0,
        "#6": 0, "#7": 0, "#8": 0, "#9": 0.0}

Please let me know if I am proposing something wrong

More: Fix deprecated python file opening mode

In the readSeqs function in seqtools.py when reading sequences, the read mode "rU" is used which is now officially deprecated in the python documentation since version 3.4.
I removed this mode after some analyses were blocked by python with the error:

ValueError: invalid mode: 'rU'

I specified the "r" (open for reading) mode because simple text files are read

More: Improved deletion of tmp files

I wanted to improve the management of the deletion of tmp files.
In short, I kept the deletion system at the end of the analysis, but transferred the triggering logic to the main, so as to give the output path as a parameter.

# __main__.py
prefix = args.outputdir
atexit.register(hemiplasytool.cleanup_earlyexit, prefix)

# hemiplasytool.py
def cleanup_earlyexit(filename):
    """Remove gene trees and sequences files. For use between batches."""
    ...

This allows tmp files to be deleted at any path specified in the --outputdir argument of HeIST.

More: Check if output folder exists function

I revised the logic of output file management.
If a destination path with a sub-folder (e.g. ./folder/file) is specified in the --outputdir parameter, and this folder does not exist, a function will take care of creating it itself.
Specifically, the code will remove the last element from the path, assuming it is a file.
So if only the file name is specified, in anticipation of having the output files in the execution root, they will be generated in the same way as before.
If a series of non-existent sub-folders is specified, the entire path will be created. (e.g. ./folder1/folder2/folder3/file)

Last tip

I have changed all version references of HeIST to 0.4.1, defining this change as a hotfix.
I therefore suggest creating a new tag.

Finally, I ask you to update the image on PyPI.org as I currently see that the latest version available is 0.3.1 so that anyone can use the stable version of HeIST (See #15)

Thank you


Ps: about the commit comment Fix KeyError: #9 in heistMerge execution GitHub understood #9 as a tag to a previous pull request.
Was completely unintentional, I hope I didn't cause any confusion.

@lhugolach lhugolach changed the title Fix heistMerge execution Fix heistMerge execution and more... May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant