Skip to content

Misc code review #18

@cmdcolin

Description

@cmdcolin

TODO: stipulate that bam files are split by chromosome (for efficiency)?

probably unneeded, bam files are indexed meaning you can fetch a specific genomic region efficiently without breaking up the file. similar thing with faidx, fasta files can be indexed for random genomic range access, bam files can be indexed for genomic range access as well (xref #13 )

TODO: use pysam pileup instead?

this is an interesting idea. the pileup format and algorithms do a lot of legwork for you, and since we are just counting number of reads that have a certain attribute at each position, it could help. but if the cigar processing code that is in place works, could just leave it as is. worth checking out the pileup to understand it perhaps though just for interest.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions