Skip to content

Make constructing records less horribly inefficient #80

@jakobnissen

Description

@jakobnissen

In v2, the way records are constructed (directly, not parsed from a file) is not very nice, which hints at an underlying problem. Luckily this is internal and can be solved in a non-breaking feature release.

The idea is that we only want one source of what is a valid FASTX file, so when a user constructs a record from e.g. a string, we use the Automa parser also. Two issues with this:

First, the Automa parser only works on a TranscodingStream due to the way Automa generates the code. This means that to parse a bytevector like a string, we need to needlessly wrap it in a NoopStream(IOBuffer(data)), which creates way more overhead than needed.
To construct records from raw parts this is even more roundabout, since we first print the parts to an IOBuffer, then convert to a bytearray, then back to an IOBuffer.

The second issue is what happens if the user does parse(FASTARecord, ">A\nG\n>A\nG"). Clearly this should error as there are two records. But the Automa machine simply returns that it found a record. What the code does now is to try to load another record, then throw an error if that succeeds. This is simply bad design. [EDIT: I've changed that to instead use NoopStream's internals. It's better design, but still not great]

A better solution would be to somehow make parsing of a record from a byte buffer the single fundamental operation, then have the IO-based code operate on top of that. This is pretty tricky and requires a rework of Automa (but it would also make the Automa code way easier to reason about, so it should probably happen)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions