Decouple annotation from library data + other fixes from USC PANTHER#8
Open
dustine32 wants to merge 9 commits intoebi-pf-team:mainfrom
Open
Decouple annotation from library data + other fixes from USC PANTHER#8dustine32 wants to merge 9 commits intoebi-pf-team:mainfrom
dustine32 wants to merge 9 commits intoebi-pf-team:mainfrom
Conversation
Unpin biopython from exact version
Fix MSF longer than expected error
Output GO and PC terms in results
Decouple PAINT annotation data from PANTHER library data
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hello!
I've been working over the past week with your version of TreeGrafter and implemented a major change for how we (the USC PANTHER team) use PAINT GO annotation data vs. PANTHER library data with TreeGrafter. Basically, the
treegraftercan now specify an annotation file instead of hard-coding thePAINT_Annotatations_TOTAL.txtfile that is/was included in our PANTHER TreeGrafter-specific library data tarballs.The rationale (explained in this ticket) essentially boils down to: the library data (trees, FASTA, HMMs) are relatively more static than the annotation data, so we would like to be able to update annotation data in TreeGrafter more frequently than the library. This PR accommodates that and, as a bonus, fixes some other issues:
epa-ngstep would then err out (ERR Failed to find: AN34) and the result outputnode_idcol would be blank (-). This was an upstream PANTHER data issue with the tree files, so there is no code change in this PR. The change is in the new tarball.preparecode since this was basically fixing the upstream data problem. I implemented the fix at PANTHER and pushed out the new data. Since this normalization was the only operation on library data in the originalpreparecommand, the command now only expects an annotation file input for the other annotation dataprepareoperation (splitting into family-specific JSON files).--print-gooption to display annotation GO terms and protein classes in result output.Bio/folder in the repo, replaced by a requirements.txt pointing to the BioPython minimal version (>=1.86) to be installed viapip install. I may have overstepped my authority by removing it, so I'm totally open to putting theBio/folder back if needed..plans/folder files were generated by Claude Code to implement most of the above changes.To be clear about what data you should currently use for PANTHER19.0 and the current PAINT annotation data:
Note
PANTHER19.0_data_trees_hmms_only.tar.gzis a one-off filename. For future library (e.g., PANTHER20.0) data tarballs, the filename will return to thePANTHER##.#_data.tar.gzconvention, along with no longer containing any annotation file. I'llLet me know what you think about these changes. I'm pretty flexible and am willing to either commit to working exclusively with the
ebi-pf-team/treegrafterrepo from now on and/or maintain my own fork for developing TreeGrafter further. Also, a belated thank you for converting TreeGrafter to python!