Skip to content

Confused about + and - stranded nodes in GFA #25

@rickbeeloo

Description

@rickbeeloo

Let's use this very simple FASTA:

>seq1
ATATGTCGCTGATCGACTGAAATAGCATCGACTAGCTATCGAT
>seq2
ATATGTCGCTGATCGACTGAATAGTGAAATAGCATCGACTAGC
>seq3
ATATGTCGCTGATCGACTTTTTTTTGAAATAGCATCGACTAGC

Then we construct the graph: ./twopaco -k 15 -f 16 test.fa -o graph and convert it to GFA: graphdump -k 15 -f gfa2 -s test.fa graph > graph.gfa:

H       VN:Z:2.0
S       36      18      ATATGTCGCTGATCGACT
F       36      seq1+   0       18$     0       18      15M
S       24      18      TTCAGTCGATCAGCGACA
F       24      seq1-   0       18$     3       21      15M
E       36+     24-     3       18$     3       18$     15M
S       14      26      GTCGATGCTATTTCAGTCGATCAGCG
F       14      seq1-   0       26$     6       32      15M
E       24-     14-     0       15      11      26$     15M
S       11      19      TGAAATAGCATCGACTAGC
F       11      seq1+   0       19$     17      36      15M
E       14-     11+     0       15      0       15      15M
S       19      22      ATAGCATCGACTAGCTATCGAT
F       19      seq1+   0       22$     21      43$     15M
E       11+     19+     4       19$     0       15      15M
O       seq1p   36+ 24- 14- 11+ 19+
F       36      seq2+   0       18$     0       18      15M
F       24      seq2-   0       18$     3       21      15M
E       36+     24-     3       18$     3       18$     15M
S       13      33      GTCGATGCTATTTCACTATTCAGTCGATCAGCG
F       13      seq2-   0       33$     6       39      15M
E       24-     13-     0       15      18      33$     15M
F       11      seq2+   0       19$     24      43$     15M
E       13-     11+     0       15      0       15      15M
O       seq2p   36+ 24- 13- 11+
F       36      seq3+   0       18$     0       18      15M
S       12      36      GTCGATGCTATTTCAAAAAAAAGTCGATCAGCGACA
F       12      seq3-   0       36$     3       39      15M
E       36+     12-     3       18$     21      36$     15M
F       11      seq3+   0       19$     24      43$     15M
E       12-     11+     0       15      0       15      15M
O       seq3p   36+ 12- 11+

When we look at the paths we have:


seq1p   36+ 24- 14- 11+ 19+
seq2p   36+ 24- 13- 11+
seq3p   36+ 12- 11+

We can only reconstruct the sequence from the GFA by taking the reverse complement of - nodes. When we look at the paths all nodes are on the same strand (i.e. all - or all +), for example, all 24 nodes are -. So why weren't these just all recorded as +?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions