-
Notifications
You must be signed in to change notification settings - Fork 4
Misc code review #18
Description
TODO: stipulate that bam files are split by chromosome (for efficiency)?
probably unneeded, bam files are indexed meaning you can fetch a specific genomic region efficiently without breaking up the file. similar thing with faidx, fasta files can be indexed for random genomic range access, bam files can be indexed for genomic range access as well (xref #13 )
TODO: use pysam pileup instead?
this is an interesting idea. the pileup format and algorithms do a lot of legwork for you, and since we are just counting number of reads that have a certain attribute at each position, it could help. but if the cigar processing code that is in place works, could just leave it as is. worth checking out the pileup to understand it perhaps though just for interest.