Skip to content

ARU-life-sciences/gfa_recomb

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Detect potential recombination points in plant mitochondrial genomes

Using only information from incoming and outgoing nodes and with some constraints, we can detect all bidirectionally bifurcating segments in a graph. We can use aligned reads to this graph to understand how potentially recombinationally active a genome is.

# Simple CLI
gfa_recomb <GFA>
# add GAF
gfa_recomb --gaf <GAF> <GFA>

GraphAligner output

Including the --gaf <GAF> option iterates over the GAF (specifically from the GraphAligner program) to find alignments which span a focal node (only paths of length 3 considered at the moment). Example output is below.

An example (real data in the data dir):

gfa_recomb --gaf ./data/Arabidopsis_thaliana.gaf ./data/Arabidopsis_thaliana.mito.gfa

Should give the following output.

path_1        cov_1	 path_2	        cov_2	recomb_score
<u64>u66>u67	171	   <u67<u66>u64	  202	  0.917
<u65>u66>u67	180	   <u67<u66>u65	  192	  0.968
<u64>u66>u68	162	   <u68<u66>u64	  168	  0.982
<u65>u66>u68	171	   <u68<u66>u65	  208	  0.902
>u64<u69<u67	160	   >u67>u69<u64	  126	  0.881
>u65<u69<u67	175	   >u67>u69<u65	  171	  0.988
>u65<u69<u68	159	   >u68>u69<u65	  147	  0.961
>u64<u69<u68	123	   >u68>u69<u64	  152	  0.895

Recombination potential: 0.937
RCI: 2.810

repeat_node	path_count	entropy
u66	8	2.995
u69	8	2.990

Mean entropy: 2.992
Total entropy: 5.984

Top table

  • path_1, path_2: 3-node paths (e.g. "u66<u67").
  • cov_1, cov_2: Coverage counts (number of alignments supporting that path).
  • recomb_score: A normalized measure of how balanced the coverage is:

recomb_score = 2⋅min(cov1 /​(cov1​+ cov2), cov2 /​(cov1​+ cov2)​)

The score ranges from 0 (unbalanced) to 1 (completely balanced coverage).

Recombination potential and RCI

Recombination potential is the mean recombination score across all reverse-complement path pairs.

The RCI (recombination complexity index) is:

RCI = (1 / |R|) * sum over r [ S_r * log2(P_r) ]

Where:

  • |R| is the number of repeat nodes with ≥2 distinct reverse-complement path pairs.
  • P_r is the number of distinct paths through repeat node r
  • S_r is the average recombination score at repeat node r

This index captures how evenly recombination is supported. Higher RCI implies greater structural diversity and more balanced recombination.

Entropies

The Shannon entropy of path usage through each focal repeat node is reported. Entropy reflects diversity in recombination routes. Total entropy sums over each of the entropies at each focal node.

About

Detect bidirectionally bifurcated repeats in mitochondrial GFA genomes.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages