Skip to content

Extract could operate on either type of BAM ordering #27

@cerebis

Description

@cerebis

bin3C imposes on the user that input BAMs are query name sorted. This makes pair matching trivial and low memory. However, when it comes to invoking bin3C extract -f bam ..., a coordinate sorted and indexed BAM would be much faster to process.

Fix

We should inspect the BAM for ordering and adapt the parsing logic from iterating over the entire input BAM (ie fetch(until_eof=True)) to iterating over the involved references and fetching alignments.

ie

for ref_name in cluster: 
    for aln in fetch(ref_name):
        # do something

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions