-
Notifications
You must be signed in to change notification settings - Fork 1
update extension strategy to match BLAST #129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Labels
enhancement
New feature or request
Milestone
Comments
thatbudakguy
added a commit
that referenced
this issue
Nov 27, 2020
- Refactor string distance extenders - Extend backwards as well as forwards - Update main pipeline to filter matches prior to extending See #129
thatbudakguy
added a commit
that referenced
this issue
Dec 11, 2020
- Refactor string distance extenders - Extend backwards as well as forwards See #129
#151 partially implements this (the extend-in-both directions part), but there's a lot more we could do. in particular, using BLAST's smart method of preselecting high-scoring seeds would probably help us a lot. |
thatbudakguy
added a commit
that referenced
this issue
Jan 26, 2021
- Refactor string distance extenders - Extend backwards as well as forwards - Update main pipeline to filter matches prior to extending See #129
thatbudakguy
added a commit
that referenced
this issue
Jan 26, 2021
- Refactor string distance extenders - Extend backwards as well as forwards See #129
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
the actual implementation of BLAST extends in both directions, which would help us because it would allow discarding matches immediately after the seeding step (instead of needing to extend and align prior to discarding in order to make sure the match meets the criteria for being discarded).
BLAST also uses a poisson method to do what we called "condensing" high-similarity regions; BLAST2 used the simpler sum-of-scores method. In either case it may be worth revisiting this step to see if it affords performance gains.
see Vesanto 2019 for a broad overview of using BLAST for text reuse.
The text was updated successfully, but these errors were encountered: