-
Notifications
You must be signed in to change notification settings - Fork 10
Confused about + and - stranded nodes in GFA #25
Copy link
Copy link
Open
Description
Let's use this very simple FASTA:
>seq1
ATATGTCGCTGATCGACTGAAATAGCATCGACTAGCTATCGAT
>seq2
ATATGTCGCTGATCGACTGAATAGTGAAATAGCATCGACTAGC
>seq3
ATATGTCGCTGATCGACTTTTTTTTGAAATAGCATCGACTAGC
Then we construct the graph: ./twopaco -k 15 -f 16 test.fa -o graph and convert it to GFA: graphdump -k 15 -f gfa2 -s test.fa graph > graph.gfa:
H VN:Z:2.0
S 36 18 ATATGTCGCTGATCGACT
F 36 seq1+ 0 18$ 0 18 15M
S 24 18 TTCAGTCGATCAGCGACA
F 24 seq1- 0 18$ 3 21 15M
E 36+ 24- 3 18$ 3 18$ 15M
S 14 26 GTCGATGCTATTTCAGTCGATCAGCG
F 14 seq1- 0 26$ 6 32 15M
E 24- 14- 0 15 11 26$ 15M
S 11 19 TGAAATAGCATCGACTAGC
F 11 seq1+ 0 19$ 17 36 15M
E 14- 11+ 0 15 0 15 15M
S 19 22 ATAGCATCGACTAGCTATCGAT
F 19 seq1+ 0 22$ 21 43$ 15M
E 11+ 19+ 4 19$ 0 15 15M
O seq1p 36+ 24- 14- 11+ 19+
F 36 seq2+ 0 18$ 0 18 15M
F 24 seq2- 0 18$ 3 21 15M
E 36+ 24- 3 18$ 3 18$ 15M
S 13 33 GTCGATGCTATTTCACTATTCAGTCGATCAGCG
F 13 seq2- 0 33$ 6 39 15M
E 24- 13- 0 15 18 33$ 15M
F 11 seq2+ 0 19$ 24 43$ 15M
E 13- 11+ 0 15 0 15 15M
O seq2p 36+ 24- 13- 11+
F 36 seq3+ 0 18$ 0 18 15M
S 12 36 GTCGATGCTATTTCAAAAAAAAGTCGATCAGCGACA
F 12 seq3- 0 36$ 3 39 15M
E 36+ 12- 3 18$ 21 36$ 15M
F 11 seq3+ 0 19$ 24 43$ 15M
E 12- 11+ 0 15 0 15 15M
O seq3p 36+ 12- 11+
When we look at the paths we have:
seq1p 36+ 24- 14- 11+ 19+
seq2p 36+ 24- 13- 11+
seq3p 36+ 12- 11+
We can only reconstruct the sequence from the GFA by taking the reverse complement of - nodes. When we look at the paths all nodes are on the same strand (i.e. all - or all +), for example, all 24 nodes are -. So why weren't these just all recorded as +?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels