Fix indexing error for reverse strand in calc_orf_gc#101
Open
althonos wants to merge 1 commit intohyattpd:GoogleImportfrom
Open
Fix indexing error for reverse strand in calc_orf_gc#101althonos wants to merge 1 commit intohyattpd:GoogleImportfrom
calc_orf_gc#101althonos wants to merge 1 commit intohyattpd:GoogleImportfrom
Conversation
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi again!
Overview
I noticed while addressing #100 that the GC% computed for genes in the reverse strand was often wrong by a small margin, despite the correct gene sequence being extracted, pointing at an indexing error. After checking for out-of-bound reads, I noticed that in
calc_orf_gcthe loop would read past sequence end in the following part:Indeed, on the reverse strand,
last[fr]is set to the index of the STOP codon; because for reverse-strand codon this is always the index of the last nucleotide, not the first, the iteration should start 1 nucleotide later, not 3.Fix
Start the iteration at the right coordinates 😄
Example
Taking the same contig
CAKWEX010000332.1as in #100, I ran Prodigal on both the contig and its reverse complement; the genes predicted in both cases matched in sequences, but the GC% didn't match; namely, the GC content was wrong when the genes were on the reverse strand (i changed thegc_contprecision so that the difference is easier to see):After applying the fix, the GC-content is consistent independently of whether the gene is on the direct or reverse strand: