Fix buffer initialization in SD motif detection code#100
Merged
hyattpd merged 2 commits intohyattpd:GoogleImportfrom Jan 27, 2023
Merged
Fix buffer initialization in SD motif detection code#100hyattpd merged 2 commits intohyattpd:GoogleImportfrom
hyattpd merged 2 commits intohyattpd:GoogleImportfrom
Conversation
This was referenced Jan 28, 2023
Contributor
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi again !
A very detailed report by @pchaumeil in althonos/pyrodigal#28 helped me discover a new bug in the Prodigal source. This is affecting the detection of Shine-Dalgarno motifs for start codons on the reverse strand, where the searched region may go out-of-bound.
Overview
In
shine_dalgarno_mmandshine_dalgarno_exact, thematchbuffer stores a score for each position of the motif being searched. In the event the region overlaps with the sequence edge, only the sequence characters should be matched. However, an issue with indexing causes the initialization not to happen for ahead-of-start nucleotides, and thematchbuffer may contain random bytes. This can lead to the wrong RBS motif being identified, and occasionally some genes having wrong coordinates.Fix
Fix initialization of the
matchbuffer so that all positions are set to a default mismatch value (-10). In the code that compares the 6-base region toAGGAGG, positions before0are skipped, and somatch[i]may no be initialized for all values ofi:Initializing the array to -10 makes the undefined behaviour disappear. In fact, since
matchis stack-allocated, if may contain remnants of a previous call, like2.0or3.0, which made the error hard to debug in the first place.Example
In the assembly GCA_934838455.1, there is a contig (
CAKWEX010000332.1) which ends with the following nucleotide sequence:It contains a start codon (1) and a RBS motif (2) on the reverse strand. However, after running Prodigal on this contig, the predicted gene is:
As you can see, the RBS motif is
AGGA; however, there is noAGGAin the sequence upstream of the start codon, onlyGGA. The missingAis actually an artifact of thematchbuffer not being initialized. After applying the fix, the gene is predicted with aGGAmotif, as expected: