Refactor Alignment and Data Processing

The way we perform alignments could be much more efficient. We toss all reads with an insertion or deletion, so we are assuming _a priori_ that the returned read aligns to the reference. As a result, there should be no need to perform a global alignment with Biopython -- we can just compare the reads to the reference, aligning the tail-ends of the reads to the appropriate ends of the reference. Reads with a given number of mismatches can then be discarded. 

Doing this would allow us to (1) avoid the O(n^2) memory requirement for aligning to a reference of length n, (2) ordinally encode characters from the beginning, thus saving on memory, and (3) take advantage of vectorization with numpy to perform alignment QC and counting.

We may also want to play around with when exactly new processes are spawned for data analysis. Ideally, we want to send as little data as possible to the spawned processes, then return only what we need to comprehensively analyze all wells. Reorganizing code to maximize this transfer/memory efficiency should also reduce memory bloat.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Alignment and Data Processing #34

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Refactor Alignment and Data Processing #34

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions