Using only information from incoming and outgoing nodes and with some constraints, we can detect all bidirectionally bifurcating segments in a graph. We can use aligned reads to this graph to understand how potentially recombinationally active a genome is.
# Simple CLI
gfa_recomb <GFA>
# add GAF
gfa_recomb --gaf <GAF> <GFA>Including the --gaf <GAF> option iterates over the GAF (specifically from the GraphAligner program) to find alignments which span a focal node (only paths of length 3 considered at the moment). Example output is below.
An example (real data in the data dir):
gfa_recomb --gaf ./data/Arabidopsis_thaliana.gaf ./data/Arabidopsis_thaliana.mito.gfa
Should give the following output.
path_1 cov_1 path_2 cov_2 recomb_score
<u64>u66>u67 171 <u67<u66>u64 202 0.917
<u65>u66>u67 180 <u67<u66>u65 192 0.968
<u64>u66>u68 162 <u68<u66>u64 168 0.982
<u65>u66>u68 171 <u68<u66>u65 208 0.902
>u64<u69<u67 160 >u67>u69<u64 126 0.881
>u65<u69<u67 175 >u67>u69<u65 171 0.988
>u65<u69<u68 159 >u68>u69<u65 147 0.961
>u64<u69<u68 123 >u68>u69<u64 152 0.895
Recombination potential: 0.937
RCI: 2.810
repeat_node path_count entropy
u66 8 2.995
u69 8 2.990
Mean entropy: 2.992
Total entropy: 5.984
- path_1, path_2: 3-node paths (e.g. "u66<u67").
- cov_1, cov_2: Coverage counts (number of alignments supporting that path).
- recomb_score: A normalized measure of how balanced the coverage is:
recomb_score = 2⋅min(cov1 /(cov1+ cov2), cov2 /(cov1+ cov2))
The score ranges from 0 (unbalanced) to 1 (completely balanced coverage).
Recombination potential is the mean recombination score across all reverse-complement path pairs.
The RCI (recombination complexity index) is:
RCI = (1 / |R|) * sum over r [ S_r * log2(P_r) ]
Where:
- |R| is the number of repeat nodes with ≥2 distinct reverse-complement path pairs.
- P_r is the number of distinct paths through repeat node r
- S_r is the average recombination score at repeat node r
This index captures how evenly recombination is supported. Higher RCI implies greater structural diversity and more balanced recombination.
The Shannon entropy of path usage through each focal repeat node is reported. Entropy reflects diversity in recombination routes. Total entropy sums over each of the entropies at each focal node.