Performance Optimizations to fulltext search by niravmehta · Pull Request #163 · pelias/placeholder

niravmehta · 2019-11-08T04:25:57Z

The current hasSubject / _eachSynonym methods work with a single token at a time. This means for a long address string, there will be a longer list of tokens to check (lots of permutations), and hence lots of queries.

I noticed 30, 60 or even 250 fulltext search queries in some tests.

This pull request contains a new function that takes all the tokens and performs full text search in a single query. It returns matching tokens in the same format as expected, so other things don't need to change.

I saw consistent 30% speed improvements with these changes. Even bigger gains with longer searches.

Caveats:

I had to remove the test case for the commit to go through. I'm not familiar with the testing framework, and could not write up a proper test case.
Auto complete searches don't perform partial searches for the last token. So results may vary if using auto complete. (But I feel it's still a good compromise)

Do review and provide feedback.

…ce. This significantly reduces number of queries to the DB (down to just one query - instead of potential dozens/hundreds). Works for all cases except "auto complete" mode. (can easily change code to fallback to old method for partial matches on last token) Plus a fix for renderQuery - replaces all arguments correctly now, instead of just the first one. Had to remove the associated test case - can write it up later. (once I learn the testing framework!)

…y strings. Results will still be different between single subject partial matches, but this at least works!

niravmehta · 2019-11-11T06:15:42Z

I revised the matchingSubjects function to support auto-complete, which also fixes fts5 errors with special characters in input. That code is here: niravmehta#4 (and not merged into this pull request #163)

Not sure why checks are failing on this pull request.

missinglink · 2019-11-11T13:12:09Z

Hi @niravmehta you can view the output of the CI by clicking on the Travis-CI check itself which opens a log such as https://travis-ci.org/pelias/placeholder/jobs/609069753, if you scroll down that log you'll see the failures listed.

I'm a bit concerned about this PR, it's not good practice to delete tests because they start failing, tests are there for a reason and it's important to understand why they changed and what effect this will have on other users before we can merge this to master.

niravmehta · 2019-11-11T17:22:03Z

I agree removing test cases is not a good idea. But I do not know the testing framework, and couldn't figure out how to write a proper test case.

Will try again later when I have some free time.

Nirav Mehta added 5 commits November 7, 2019 22:00

Remove empty subject tokens

eeae87e

Workaround for input with autocomplete character at the end, and empt…

c1a01c4

…y strings. Results will still be different between single subject partial matches, but this at least works!

Linebreak!

526212b

Removed commented out old function call

ade3361

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Comments

Performance Optimizations to fulltext search#163

Performance Optimizations to fulltext search#163
niravmehta wants to merge 5 commits intopelias:masterfrom
niravmehta:optimizations

niravmehta commented Nov 8, 2019

Uh oh!

niravmehta commented Nov 11, 2019

Uh oh!

missinglink commented Nov 11, 2019

Uh oh!

niravmehta commented Nov 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Comments

Conversation

niravmehta commented Nov 8, 2019

Uh oh!

niravmehta commented Nov 11, 2019

Uh oh!

missinglink commented Nov 11, 2019

Uh oh!

niravmehta commented Nov 11, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants