perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221
Draft
voidwisp wants to merge 1 commit intoawslabs:mainfrom
Draft
perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221voidwisp wants to merge 1 commit intoawslabs:mainfrom
voidwisp wants to merge 1 commit intoawslabs:mainfrom
Conversation
…rently with _init
In CompositeTraversalBasedRetriever._retrieve, the entity-context phase
(self._init, ~2s on Neptune Serverless + AOSS) runs strictly before each
sub-retriever's get_start_node_ids. For ChunkBasedSearch.get_start_node_ids
specifically, the call reads only query_bundle / vector_store / args.vss_*
— it does not touch self.entity_contexts (which is what _init builds), so
it has no data dependency on _init and can run concurrently with it.
Override _retrieve in CompositeTraversalBasedRetriever to kick off a
single-worker ThreadPoolExecutor that computes the chunk-VSS top-k via
get_diverse_vss_elements before super()._retrieve(query_bundle) runs.
Attach the resulting future onto each ChunkBasedSearch instance in
_get_search_results_for_query. ChunkBasedSearch.get_start_node_ids pops
the attribute and consumes the future via .result() if present, otherwise
falls back to the existing inline VSS call.
Guards:
- Skip prefetch when args.derive_subqueries is True (subqueries carry
different query_bundles, the prefetch was built from the original).
- Skip when no ChunkBasedSearch is in the configured retriever list.
- Consume-and-clear via __dict__.pop so a reused instance can't pick up
a stale future on the next call.
- add_done_callback logs at debug level if the prefetch raises AND
_init raises first (would otherwise be swallowed).
Validated against production Neptune Serverless + AOSS (toolkit v3.18.3,
pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples
interleaved OLD vs NEW:
- Correctness: start_node_ids set-equal across all 12 queries.
- Perf: paired median delta -85 ms, paired mean -95 ms, 10/12 queries
improved. Worst case +37 ms (within query-intrinsic variance).
Contributor
Author
|
Heads up — this is a work-in-progress draft, not ready for merge. Posting for early directional feedback on the override- |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
In
CompositeTraversalBasedRetriever._retrieve, the entity-context phase (self._init, ~2 s on Neptune Serverless + AOSS) runs strictly before each sub-retriever'sget_start_node_ids. ForChunkBasedSearch.get_start_node_idsthe call reads onlyquery_bundle/vector_store/args.vss_*— noentity_contextsdependency — so it can run concurrently with_initand be hidden behind it.This PR kicks off a single-worker
ThreadPoolExecutorprefetch ofget_diverse_vss_elements('chunk', …)beforesuper()._retrieve(query_bundle)runs, attaches the future onto eachChunkBasedSearchinstance in_get_search_results_for_query, and consumes it inChunkBasedSearch.get_start_node_ids.EntityNetworkSearch.get_start_node_idsdoes depend onentity_contexts, so it is not prefetched.The change
Two files, one method each in substance:
composite_traversal_based_retriever.py— override_retrieve; inject the future onto anyChunkBasedSearchin_get_search_results_for_query.chunk_based_search.py—pop('_prefetched_chunks')inget_start_node_idsand consume if set; fall through to the existingget_diverse_vss_elementscall otherwise.Guards: prefetch is skipped when
args.derive_subqueriesis True (subqueries carry differentquery_bundles, so the prefetch wouldn't apply), or when noChunkBasedSearchis in the retriever list.popon the consumer side ensures a reused instance never picks up a stale future.Correctness
get_diverse_vss_elementsis pure — same inputs, same output — regardless of whether it runs in-thread or on the prefetch worker. Verified experimentally:start_node_idsset-equal across 12 representative queries on prod Neptune + AOSS.Exception behavior preserved:
future.result()re-raises identically to the serial call. If the prefetch raises and_initraises first,add_done_callbacklogs the prefetch exception at debug level (otherwise it'd be swallowed).Measured impact
Validated on production Neptune Serverless + AOSS (toolkit v3.18.3,
pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples, interleaved OLD/NEW:start_node_ids)Pre-measurement predicted up to a 168 ms ceiling (median chunk-VSS time). Actual savings land at ~50–60% of ceiling, consistent with thread-pool overhead, some GIL contention during the tfidf rerank in
_init, and possible contention between the prefetch andKeywordVSSProvider's topic VSS on the shared OpenSearch pool. Small, consistent, low-risk optimization — not a game-changer.Backwards compatibility
ChunkBasedSearchgains a private consume-once attribute (_prefetched_chunks) that is absent unless the composite sets it. DirectChunkBasedSearchusage outside the composite is unaffected.GraphStore/VectorStore— pure client-side concurrency, no storage-engine features used.Test plan
GraphStore(Neo4j, Memgraph, etc.) — change is pure-Python concurrency so it should port cleanlyDraft
Marked draft — posting for early feedback on the override-
_retrieve+ duck-typed attribute approach. Open questions: (1) is the attribute-injection pattern acceptable, or would a constructor kwarg be preferred despite the pre-constructed-instance edge case? (2) the savings are smaller than the initial phase-1 projection; worth digging into the thread-overhead shortfall before merging?Note: this PR was drafted with Claude Code