-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
As a JanusGraph user, I would like to be able to tell JanusGraph to prioritize certain indices higher than others, because I know things about the distribution of the keys in those indices that makes one index particularly slow.
Full background:
We've run into an issue with JanusGraph (preexisting from Titan) with index prioritization (referring to GraphCentricQueryBuilder#constructQueryWithoutProfile). We have 2 indices: a mixed index in ElasticSearch, and an exact-match composite index on two properties. The mixed index is a pretty normal name index. However, the composite index is extremely dense - we basically switched to using this composite index because we kept ending up with supernodes in the graph that made certain vertices impractical to work with, so the composite index may contain millions of entries on only a single-digit number of keys.
Each of these indices work fine on their own, but when you have a query that both indices apply to it gets problematic. In short: we'd like to always hit the mixed index first, because it narrows down the number of results significantly more quickly than the composite index. But the index scoring in query construction automatically gives composite indices a weight of 2000 just for being composite indices. So this means we use the super dense index first and don't really narrow down the number of results any, and it's much slower (whereas if we remove the clauses that match the composite index, the mixed index alone performs fine).
So far the best idea we have to fix this is to only query the mixed index and do the composite index filtering in memory. This is just a) a hassle and b) very error-prone, because when all your queries have a limit, doing any filtering outside of the graph traversal means that you don't know what limit to give to the traversal, and traversals without limits can have problems with the Cassandra backend (i.e. exceeding max frame size).
In my ideal world, I would have some way of telling JanusGraph "hey, when you are considering these two indices, always go to this mixed index first!"