Skip to content

perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221

Draft
voidwisp wants to merge 1 commit intoawslabs:mainfrom
voidwisp:perf/prefetch-chunk-start-node-ids
Draft

perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221
voidwisp wants to merge 1 commit intoawslabs:mainfrom
voidwisp:perf/prefetch-chunk-start-node-ids

Conversation

@voidwisp
Copy link
Copy Markdown
Contributor

Summary

In CompositeTraversalBasedRetriever._retrieve, the entity-context phase (self._init, ~2 s on Neptune Serverless + AOSS) runs strictly before each sub-retriever's get_start_node_ids. For ChunkBasedSearch.get_start_node_ids the call reads only query_bundle / vector_store / args.vss_* — no entity_contexts dependency — so it can run concurrently with _init and be hidden behind it.

This PR kicks off a single-worker ThreadPoolExecutor prefetch of get_diverse_vss_elements('chunk', …) before super()._retrieve(query_bundle) runs, attaches the future onto each ChunkBasedSearch instance in _get_search_results_for_query, and consumes it in ChunkBasedSearch.get_start_node_ids.

EntityNetworkSearch.get_start_node_ids does depend on entity_contexts, so it is not prefetched.

The change

Two files, one method each in substance:

  • composite_traversal_based_retriever.py — override _retrieve; inject the future onto any ChunkBasedSearch in _get_search_results_for_query.
  • chunk_based_search.pypop('_prefetched_chunks') in get_start_node_ids and consume if set; fall through to the existing get_diverse_vss_elements call otherwise.

Guards: prefetch is skipped when args.derive_subqueries is True (subqueries carry different query_bundles, so the prefetch wouldn't apply), or when no ChunkBasedSearch is in the retriever list. pop on the consumer side ensures a reused instance never picks up a stale future.

Correctness

get_diverse_vss_elements is pure — same inputs, same output — regardless of whether it runs in-thread or on the prefetch worker. Verified experimentally: start_node_ids set-equal across 12 representative queries on prod Neptune + AOSS.

Exception behavior preserved: future.result() re-raises identically to the serial call. If the prefetch raises and _init raises first, add_done_callback logs the prefetch exception at debug level (otherwise it'd be swallowed).

Measured impact

Validated on production Neptune Serverless + AOSS (toolkit v3.18.3, pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples, interleaved OLD/NEW:

Metric Value
Correctness 12/12 PASS (byte-identical start_node_ids)
Improved queries 10 / 12
Paired median Δ -85 ms (-4%)
Paired mean Δ -95 ms (-5%)
Worst regression (within noise) +37 ms

Pre-measurement predicted up to a 168 ms ceiling (median chunk-VSS time). Actual savings land at ~50–60% of ceiling, consistent with thread-pool overhead, some GIL contention during the tfidf rerank in _init, and possible contention between the prefetch and KeywordVSSProvider's topic VSS on the shared OpenSearch pool. Small, consistent, low-risk optimization — not a game-changer.

Backwards compatibility

  • Public method signatures unchanged.
  • ChunkBasedSearch gains a private consume-once attribute (_prefetched_chunks) that is absent unless the composite sets it. Direct ChunkBasedSearch usage outside the composite is unaffected.
  • Works for any GraphStore / VectorStore — pure client-side concurrency, no storage-engine features used.

Test plan

  • Existing suite runs green
  • Maintainer spot-check against preferred GraphStore (Neo4j, Memgraph, etc.) — change is pure-Python concurrency so it should port cleanly

Draft

Marked draft — posting for early feedback on the override-_retrieve + duck-typed attribute approach. Open questions: (1) is the attribute-injection pattern acceptable, or would a constructor kwarg be preferred despite the pre-constructed-instance edge case? (2) the savings are smaller than the initial phase-1 projection; worth digging into the thread-overhead shortfall before merging?


Note: this PR was drafted with Claude Code

…rently with _init

In CompositeTraversalBasedRetriever._retrieve, the entity-context phase
(self._init, ~2s on Neptune Serverless + AOSS) runs strictly before each
sub-retriever's get_start_node_ids. For ChunkBasedSearch.get_start_node_ids
specifically, the call reads only query_bundle / vector_store / args.vss_*
— it does not touch self.entity_contexts (which is what _init builds), so
it has no data dependency on _init and can run concurrently with it.

Override _retrieve in CompositeTraversalBasedRetriever to kick off a
single-worker ThreadPoolExecutor that computes the chunk-VSS top-k via
get_diverse_vss_elements before super()._retrieve(query_bundle) runs.
Attach the resulting future onto each ChunkBasedSearch instance in
_get_search_results_for_query. ChunkBasedSearch.get_start_node_ids pops
the attribute and consumes the future via .result() if present, otherwise
falls back to the existing inline VSS call.

Guards:
  - Skip prefetch when args.derive_subqueries is True (subqueries carry
    different query_bundles, the prefetch was built from the original).
  - Skip when no ChunkBasedSearch is in the configured retriever list.
  - Consume-and-clear via __dict__.pop so a reused instance can't pick up
    a stale future on the next call.
  - add_done_callback logs at debug level if the prefetch raises AND
    _init raises first (would otherwise be swallowed).

Validated against production Neptune Serverless + AOSS (toolkit v3.18.3,
pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples
interleaved OLD vs NEW:
  - Correctness: start_node_ids set-equal across all 12 queries.
  - Perf: paired median delta -85 ms, paired mean -95 ms, 10/12 queries
    improved. Worst case +37 ms (within query-intrinsic variance).
@voidwisp
Copy link
Copy Markdown
Contributor Author

Heads up — this is a work-in-progress draft, not ready for merge. Posting for early directional feedback on the override-_retrieve + duck-typed attribute approach. Still need to dig into why the measured savings (-85 ms median) land at ~50% of the phase-1 ceiling (~168 ms) before calling it done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant