How to handle long branches (fresh and degraded samples) and low UCE capture

Hi Dr. Faircloth,

Sorry for the constant posting on here! I've gotten through _phyluce_, working on a dataset that includes both fresh and degraded (formlin-preserved) specimens of several reptile genera. I ran through the _phyluce_ pipeline twice, once with only fresh samples, and a second time which includes both fresh and formalinized samples.

I have three questions that are a bit separate from one another, so many it is easiest if I just list them below.

1) I have had little success with incorporating UCEs from degraded samples into my alignments. The stats for the formalin samples still say I have 3000-4000 loci, but I presume many of these are very short due to the degraded nature of the samples. When I did the stats check after exploding the monolithic fasta file, I saw that some, but not all, had very few contigs, while others had thousands:
```
Formalin-Sample1.unaligned.fasta,68,17561,258.25,6.224240154111116,194,432,242.5,0
Formalin-Sample2.unaligned.fasta,57,14505,254.47368421052633,5.915185062640635,207,444,240.0,0
Formalin-Sample3.unaligned.fasta,1818,683865,376.16336633663366,3.1463025755214895,93,925,343.0,0
Formalin-Sample4.unaligned.fasta,4059,2284070,562.7174180832717,2.8498804036826355,142,5698,553.0,30
Fresh-Sample1.unaligned.fasta,4552,4857281,1067.0652460456943,3.1642806338417713,207,4040,1081.0,3264
Fresh-Sample2.unaligned.fasta,4502,4985168,1107.3229675699688,3.269171662403826,56,3290,1134.0,3532
Fresh-Sample3.unaligned.fasta,4619,4827268,1045.0894132929207,2.932026164913372,210,3048,1055.0,3072
Fresh-Sample4.unaligned.fasta,4492,4892444,1089.1460373998218,3.314282112917549,218,3334,1115.0,3419
```

When I add these into a tree (above is just a subset of formalin and fresh samples) I either get the samples grouping with the outgroup and I hit a long branch attraction scenario, or I get extremely long branches in the ingroup. I was wondering if you have any suggestions on how to deal with samples with degraded DNA? I have read publications (including the 2016 bird MER paper) but still am unsure how others handle these scenarios/filter data/trim them. I tried running internal alignment trimming to hopefully remove gappy regions due to missing data, but the result was basically the same. The tree below shows these long branches (I removed the ones in this tree that just lumped with the outgroup)

![Image](https://github.com/user-attachments/assets/4d2b6de7-2953-4b5d-894e-953284d19f1f)

2) My second question may be a bit simpler to answer. Below is a tree of only fresh samples, and as far as support goes, and a topology that makes biogeographic sense, it is looking great. The one thing that stood out to me, that I have not experienced in other UCE studies I have done, is that one genus (shown in the darker blue; different genera are different colors) in particular seems to have long branches relative to other genera:

![Image](https://github.com/user-attachments/assets/f47b09c5-171b-4d4a-9a85-ea4f2339b3b9)

I was wondering if, for UCEs, this might be expected? I have reconstructed phylogenies of entire families before but did not get such long branches. Would looking at stats help me determine if this is due to the data itself or if there is a biological reason that they have long branches?

3) A basic statistics question. Do we expect that the more distant taxa in our dataset, the fewer UCEs we capture? This project was based on three sequencing efforts over several years, two of which used the UCE 5k probe set, one of which used the Singhal et al. SqCL probe set that contains UCEs. I have attached a file with my edge alignment stats, and you will see that the 75% matrix has 2685 loci (this happens whether degraded specimens are included or I use the only-fresh sample dataset). Would it be likely that the I am only capturing about half the loci is due to sampling error from multiple, independent sequence runs, having distant taxa, and different probe sets? 

File: [edge-stats.txt](https://github.com/user-attachments/files/18871068/edge-stats.txt)

Thanks again for all of your help with all of these questions.

Best,
Justin



 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle long branches (fresh and degraded samples) and low UCE capture #367

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle long branches (fresh and degraded samples) and low UCE capture #367

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions