Conversation
via _bsolm_index_separate = true Closes #118
| if self._bsolm_index_separate and fields is None: | ||
| # Split the get_items query (TODO: get_item_ids could | ||
| # reference solely the index) query in 2: | ||
| # #1: query BsoLastModified directly for bso ids |
There was a problem hiding this comment.
What's the intended goal of using this 2 query version? I assume its an optimization to avoid the multiple executions that occur otherwise?
There was a problem hiding this comment.
and if this solution looks alright under load testing I'd like to relay those results back to Google, in case they have an alternative or a better explanation for the existing single query's perf (I still hope we won't have to do this..)
| EPOCH = datetime.datetime.utcfromtimestamp(0) | ||
|
|
||
| # a bogus FIND_ITEMS bind param | ||
| BOGUS_ID = object() |
There was a problem hiding this comment.
Is this used (or planned to be used) anywhere outside of the SpannerStorage.init()?
There was a problem hiding this comment.
it's only used in _find_items as basically a placeholder
rfk
left a comment
There was a problem hiding this comment.
I don't have my head around this 100% yet, but some initial comments...
| del bind_types["offset"] | ||
| # simiarly modified ranges/ttl aren't needed in #2 | ||
| for param in ("newer", "newer_eq", "older", "older_eq", "ttl"): | ||
| params.pop(param, None) |
There was a problem hiding this comment.
There's quite a big chunk of code in this if block, I wonder if it could be factored out into a helper method of some sort.
| fields = params.get("fields") | ||
| offset_params = params | ||
|
|
||
| if self._bsolm_index_separate and fields is None: |
There was a problem hiding this comment.
It's not obvious why and fields is None here, is it to ensure we only apply this logic in get_items rather than get_item_ids? If so, I wonder if it's worth refactoring _find_items into a couple of helper methods and pushing that logic up into get_items, like:
def get_items(...):
if self._bsolm_index_separate:
(params, ids) = self._do_bsolm_separate_query_to_get_the_ids()
return self._find_items(**params)
def get_item_ids(...):
return self._find_items(...)
Depends how much effort you want to invest in polishing this though, as opposed to just getting it out in stage to loadtest.
There was a problem hiding this comment.
That's right, to only apply it to get_items. I like the refactor suggestion but it's a little further complicated by encode_next_offset's need for the original params (offset_params).
I will keep it in mind if this is revisited. This is definitely a "get it out to stage and see" patch
|
|
||
| # Setup a 'id IN (:id_1)' bind param for #2 (replacing it | ||
| # below) | ||
| params["ids"] = [BOGUS_ID] |
There was a problem hiding this comment.
What stops us from just putting the returned ids in params["id"], and basically just pretending that the caller had requested those ids specifically?
There was a problem hiding this comment.
This is specifically to utilize spanner's UNNEST (what it's replaced with below) for query 2.
I do have a TODO for the FIND_ITEMS/sqlalchemy generation piece to do this for us. params["ids"] from the client and/or this case should both utilize UNNEST. Unfortunately coercing the sqlalchemy level to generate it for us is a bit involved, thus the current str replace.
Another quick solution for this is having FIND_ITEMS handle UNNEST by returning a str with the appropriate replacements (also not ideal, further straying from the original FIND_ITEMS code..) 🤷♂
|
@pjenvey do we need to get this merged before loadtesting, or can we do an initial test from this branch before deciding whether to push ahead with it or not? |
|
Sorry I missed the last comment until just now -- I can certainly tag this from the branch for an initial load test if more is needed to merge this PR (I'm totally fine proving out the load test before a full merge adding to the existing pile of hacks here) |
Whatever seems simplest for you. I'm happy to r+ with followup bugs for refactors, but if we're not sure yet whether we'll even keep this code, deploying from a branch feels like it might be the simpler path forward overall. |
via _bsolm_index_separate = true
Closes #118