feat: optionally split FIND_ITEMS' query in 2 by pjenvey · Pull Request #119 · mozilla-services/server-syncstorage

pjenvey · 2019-05-15T02:59:59Z

via _bsolm_index_separate = true

Closes #118

via _bsolm_index_separate = true Closes #118

bbangert · 2019-05-15T04:15:03Z

syncstorage/storage/spanner.py

+        if self._bsolm_index_separate and fields is None:
+            # Split the get_items query (TODO: get_item_ids could
+            # reference solely the index) query in 2:
+            # #1: query BsoLastModified directly for bso ids


What's the intended goal of using this 2 query version? I assume its an optimization to avoid the multiple executions that occur otherwise?

Yea, see my description in the issue #118

and if this solution looks alright under load testing I'd like to relay those results back to Google, in case they have an alternative or a better explanation for the existing single query's perf (I still hope we won't have to do this..)

jrconlin · 2019-05-15T16:35:21Z

syncstorage/storage/spanner.py

 EPOCH = datetime.datetime.utcfromtimestamp(0)

+# a bogus FIND_ITEMS bind param
+BOGUS_ID = object()


Is this used (or planned to be used) anywhere outside of the SpannerStorage.init()?

it's only used in _find_items as basically a placeholder

rfk

I don't have my head around this 100% yet, but some initial comments...

rfk · 2019-05-15T20:46:02Z

syncstorage/storage/spanner.py

+                del bind_types["offset"]
+            # simiarly modified ranges/ttl aren't needed in #2
+            for param in ("newer", "newer_eq", "older", "older_eq", "ttl"):
+                params.pop(param, None)


There's quite a big chunk of code in this if block, I wonder if it could be factored out into a helper method of some sort.

rfk · 2019-05-15T20:55:56Z

syncstorage/storage/spanner.py

+        fields = params.get("fields")
+        offset_params = params
+
+        if self._bsolm_index_separate and fields is None:


It's not obvious why and fields is None here, is it to ensure we only apply this logic in get_items rather than get_item_ids? If so, I wonder if it's worth refactoring _find_items into a couple of helper methods and pushing that logic up into get_items, like:

def get_items(...): if self._bsolm_index_separate: (params, ids) = self._do_bsolm_separate_query_to_get_the_ids() return self._find_items(**params) def get_item_ids(...): return self._find_items(...)

Depends how much effort you want to invest in polishing this though, as opposed to just getting it out in stage to loadtest.

That's right, to only apply it to get_items. I like the refactor suggestion but it's a little further complicated by encode_next_offset's need for the original params (offset_params).

I will keep it in mind if this is revisited. This is definitely a "get it out to stage and see" patch

rfk · 2019-05-15T20:59:09Z

syncstorage/storage/spanner.py

+
+            # Setup a 'id IN (:id_1)' bind param for #2 (replacing it
+            # below)
+            params["ids"] = [BOGUS_ID]


What stops us from just putting the returned ids in params["id"], and basically just pretending that the caller had requested those ids specifically?

This is specifically to utilize spanner's UNNEST (what it's replaced with below) for query 2.

I do have a TODO for the FIND_ITEMS/sqlalchemy generation piece to do this for us. params["ids"] from the client and/or this case should both utilize UNNEST. Unfortunately coercing the sqlalchemy level to generate it for us is a bit involved, thus the current str replace.

Another quick solution for this is having FIND_ITEMS handle UNNEST by returning a str with the appropriate replacements (also not ideal, further straying from the original FIND_ITEMS code..) 🤷‍♂

rfk · 2019-05-16T01:12:51Z

@pjenvey do we need to get this merged before loadtesting, or can we do an initial test from this branch before deciding whether to push ahead with it or not?

pjenvey · 2019-05-16T21:55:52Z

Sorry I missed the last comment until just now -- I can certainly tag this from the branch for an initial load test if more is needed to merge this PR (I'm totally fine proving out the load test before a full merge adding to the existing pile of hacks here)

rfk · 2019-05-16T22:00:23Z

I can certainly tag this from the branch for an initial load test if more is needed to merge this PR

Whatever seems simplest for you. I'm happy to r+ with followup bugs for refactors, but if we're not sure yet whether we'll even keep this code, deploying from a branch feels like it might be the simpler path forward overall.

feat: optionally split FIND_ITEMS' query in 2

ad6b8ab

via _bsolm_index_separate = true Closes #118

pjenvey added spanner 🔃 Durable Sync labels May 15, 2019

pjenvey requested review from bbangert and rfk May 15, 2019 02:59

bbangert reviewed May 15, 2019

View reviewed changes

jrconlin reviewed May 15, 2019

View reviewed changes

rfk reviewed May 15, 2019

View reviewed changes

Conversation

pjenvey commented May 15, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pjenvey May 15, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rfk commented May 16, 2019

Uh oh!

pjenvey commented May 16, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rfk commented May 16, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pjenvey May 15, 2019 •

edited

Loading

pjenvey commented May 16, 2019 •

edited

Loading