-
Notifications
You must be signed in to change notification settings - Fork 14
Description
Hi,
Thanks for creating this program!
I'm trying to use TreeShrink on 388 trees built for loci in my dataset, in order to detect and potentially remove taxa on long branches that may indicate assembly errors.
I built the program in a Singularity container which has Ubuntu 20.04 with python 3.8.10, R 3.6.3 and BMS 0.3.4 like so:
git clone https://github.com/uym2/TreeShrink.git
cd TreeShrink
python3 setup.py install
cd .. && rm -r TreeShrinkI ran TreeShrink on my trees file (from IQ-TREE) like so:
singularity exec -H "$(pwd)" phylo.sif run_treeshrink.py -t loci.treefile -m per-species -q 0.05 -O output_ts -o treeshrink(note that the program issues warnings about BMS being built under R 4.0.3, though the BMS in the container was not; maybe I've messed something up here?)
I noticed strange behaviour for the first locus when I checked -- it dropped some long branches, but also some short branches, and it left some long branches intact.
The lines in the output files are:
Gene Species Taxon Signature
0 376716 376716 0.16188346887963004
0 376724Llin 376724Llin 0.16188346887963004
0 376683 376683 0.16188346887963004
0 376657 376657 0.16188346887963004
0 376745 376745 0.16188346887963004
0 376735 376735 0.07840338555281828
0 376770 376770 0.09844712647947677
0 376757 376757 0.05006538515754765
0 376774 376774 0.057018397312663276
0 376749 376749 0.04242609694816515
0 376778 376778 0.09844712647947677
0 376680 376680 0.09844712647947677
0 376729 376729 0.09844712647947677
0 376780 376780 0.09844712647947677
0 376750 376750 0.09844712647947677
0 376761 376761 0.09844712647947677
0 376677 376677 0.04242609694816515
0 376776Tcog 376776Tcog 0.04242609694816515
0 376718Llin 376718Llin 0.04242609694816515
0 376772 376772 0.04242609694816515
0 79740Tcog 79740Tcog 0.04242609694816515
0 376747 376747 0.04242609694816515
0 376725Llin 376725Llin 0.04242609694816515
0 376722Lmic 376722Lmic 0.027536231822437963And for the actual deletions:
376724Llin 376745 376735 376778 376780 376761I've created an image of the tree before (left) and after (right) TreeShrink, showing the dropped taxa in cyan and the "missed" taxa in red. Note the short branches dropped. I don't understand the summary file which doesn't seem to show any difference between a dropped taxon (e.g. 36724Llin) and one which was not dropped (e.g. 376716).
