Skip to content

removes taxa on short branches #39

@bmichanderson

Description

@bmichanderson

Hi,
Thanks for creating this program!
I'm trying to use TreeShrink on 388 trees built for loci in my dataset, in order to detect and potentially remove taxa on long branches that may indicate assembly errors.

I built the program in a Singularity container which has Ubuntu 20.04 with python 3.8.10, R 3.6.3 and BMS 0.3.4 like so:

git clone https://github.com/uym2/TreeShrink.git
cd TreeShrink
python3 setup.py install
cd .. && rm -r TreeShrink

I ran TreeShrink on my trees file (from IQ-TREE) like so:

singularity exec -H "$(pwd)" phylo.sif run_treeshrink.py -t loci.treefile -m per-species -q 0.05 -O output_ts -o treeshrink

(note that the program issues warnings about BMS being built under R 4.0.3, though the BMS in the container was not; maybe I've messed something up here?)

I noticed strange behaviour for the first locus when I checked -- it dropped some long branches, but also some short branches, and it left some long branches intact.

The lines in the output files are:

Gene Species Taxon Signature
0 376716 376716 0.16188346887963004
0 376724Llin 376724Llin 0.16188346887963004
0 376683 376683 0.16188346887963004
0 376657 376657 0.16188346887963004
0 376745 376745 0.16188346887963004
0 376735 376735 0.07840338555281828
0 376770 376770 0.09844712647947677
0 376757 376757 0.05006538515754765
0 376774 376774 0.057018397312663276
0 376749 376749 0.04242609694816515
0 376778 376778 0.09844712647947677
0 376680 376680 0.09844712647947677
0 376729 376729 0.09844712647947677
0 376780 376780 0.09844712647947677
0 376750 376750 0.09844712647947677
0 376761 376761 0.09844712647947677
0 376677 376677 0.04242609694816515
0 376776Tcog 376776Tcog 0.04242609694816515
0 376718Llin 376718Llin 0.04242609694816515
0 376772 376772 0.04242609694816515
0 79740Tcog 79740Tcog 0.04242609694816515
0 376747 376747 0.04242609694816515
0 376725Llin 376725Llin 0.04242609694816515
0 376722Lmic 376722Lmic 0.027536231822437963

And for the actual deletions:

376724Llin      376745  376735  376778  376780  376761

I've created an image of the tree before (left) and after (right) TreeShrink, showing the dropped taxa in cyan and the "missed" taxa in red. Note the short branches dropped. I don't understand the summary file which doesn't seem to show any difference between a dropped taxon (e.g. 36724Llin) and one which was not dropped (e.g. 376716).

Any idea what is causing this behaviour?
locus1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions