Skip to content

update extension strategy to match BLAST #129

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
thatbudakguy opened this issue Nov 10, 2020 · 1 comment
Open

update extension strategy to match BLAST #129

thatbudakguy opened this issue Nov 10, 2020 · 1 comment
Labels
enhancement New feature or request
Milestone

Comments

@thatbudakguy
Copy link
Member

thatbudakguy commented Nov 10, 2020

the actual implementation of BLAST extends in both directions, which would help us because it would allow discarding matches immediately after the seeding step (instead of needing to extend and align prior to discarding in order to make sure the match meets the criteria for being discarded).

BLAST also uses a poisson method to do what we called "condensing" high-similarity regions; BLAST2 used the simpler sum-of-scores method. In either case it may be worth revisiting this step to see if it affords performance gains.

see Vesanto 2019 for a broad overview of using BLAST for text reuse.

@thatbudakguy thatbudakguy added the enhancement New feature or request label Nov 10, 2020
@thatbudakguy thatbudakguy changed the title updating seeding strategy to match BLAST updating strategy to match BLAST Nov 14, 2020
@thatbudakguy thatbudakguy changed the title updating strategy to match BLAST update extension strategy to match BLAST Nov 18, 2020
@thatbudakguy thatbudakguy added this to the v2.0 milestone Nov 18, 2020
thatbudakguy added a commit that referenced this issue Nov 27, 2020
- Refactor string distance extenders
- Extend backwards as well as forwards
- Update main pipeline to filter matches prior to extending

See #129
thatbudakguy added a commit that referenced this issue Dec 11, 2020
- Refactor string distance extenders
- Extend backwards as well as forwards

See #129
@thatbudakguy
Copy link
Member Author

#151 partially implements this (the extend-in-both directions part), but there's a lot more we could do. in particular, using BLAST's smart method of preselecting high-scoring seeds would probably help us a lot.

thatbudakguy added a commit that referenced this issue Jan 26, 2021
- Refactor string distance extenders
- Extend backwards as well as forwards
- Update main pipeline to filter matches prior to extending

See #129
thatbudakguy added a commit that referenced this issue Jan 26, 2021
- Refactor string distance extenders
- Extend backwards as well as forwards

See #129
@thatbudakguy thatbudakguy removed this from the v2.0 milestone Jan 27, 2021
@thatbudakguy thatbudakguy added this to the v3.0 milestone Feb 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant