perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init by voidwisp · Pull Request #221 · awslabs/graphrag-toolkit

voidwisp · 2026-04-24T15:21:31Z

Summary

In CompositeTraversalBasedRetriever._retrieve, the entity-context phase (self._init, ~2 s on Neptune Serverless + AOSS) runs strictly before each sub-retriever's get_start_node_ids. For ChunkBasedSearch.get_start_node_ids the call reads only query_bundle / vector_store / args.vss_* — no entity_contexts dependency — so it can run concurrently with _init and be hidden behind it.

This PR kicks off a single-worker ThreadPoolExecutor prefetch of get_diverse_vss_elements('chunk', …) before super()._retrieve(query_bundle) runs, attaches the future onto each ChunkBasedSearch instance in _get_search_results_for_query, and consumes it in ChunkBasedSearch.get_start_node_ids.

EntityNetworkSearch.get_start_node_ids does depend on entity_contexts, so it is not prefetched.

The change

Two files, one method each in substance:

composite_traversal_based_retriever.py — override _retrieve; inject the future onto any ChunkBasedSearch in _get_search_results_for_query.
chunk_based_search.py — pop('_prefetched_chunks') in get_start_node_ids and consume if set; fall through to the existing get_diverse_vss_elements call otherwise.

Guards: prefetch is skipped when args.derive_subqueries is True (subqueries carry different query_bundles, so the prefetch wouldn't apply), or when no ChunkBasedSearch is in the retriever list. pop on the consumer side ensures a reused instance never picks up a stale future.

Correctness

get_diverse_vss_elements is pure — same inputs, same output — regardless of whether it runs in-thread or on the prefetch worker. Verified experimentally: start_node_ids set-equal across 12 representative queries on prod Neptune + AOSS.

Exception behavior preserved: future.result() re-raises identically to the serial call. If the prefetch raises and _init raises first, add_done_callback logs the prefetch exception at debug level (otherwise it'd be swallowed).

Measured impact

Validated on production Neptune Serverless + AOSS (toolkit v3.18.3, pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples, interleaved OLD/NEW:

Metric	Value
Correctness	12/12 PASS (byte-identical `start_node_ids`)
Improved queries	10 / 12
Paired median Δ	-85 ms (-4%)
Paired mean Δ	-95 ms (-5%)
Worst regression (within noise)	+37 ms

Pre-measurement predicted up to a 168 ms ceiling (median chunk-VSS time). Actual savings land at ~50–60% of ceiling, consistent with thread-pool overhead, some GIL contention during the tfidf rerank in _init, and possible contention between the prefetch and KeywordVSSProvider's topic VSS on the shared OpenSearch pool. Small, consistent, low-risk optimization — not a game-changer.

Backwards compatibility

Public method signatures unchanged.
ChunkBasedSearch gains a private consume-once attribute (_prefetched_chunks) that is absent unless the composite sets it. Direct ChunkBasedSearch usage outside the composite is unaffected.
Works for any GraphStore / VectorStore — pure client-side concurrency, no storage-engine features used.

Test plan

Existing suite runs green
Maintainer spot-check against preferred GraphStore (Neo4j, Memgraph, etc.) — change is pure-Python concurrency so it should port cleanly

Draft

Marked draft — posting for early feedback on the override-_retrieve + duck-typed attribute approach. Open questions: (1) is the attribute-injection pattern acceptable, or would a constructor kwarg be preferred despite the pre-constructed-instance edge case? (2) the savings are smaller than the initial phase-1 projection; worth digging into the thread-overhead shortfall before merging?

Note: this PR was drafted with Claude Code

…rently with _init In CompositeTraversalBasedRetriever._retrieve, the entity-context phase (self._init, ~2s on Neptune Serverless + AOSS) runs strictly before each sub-retriever's get_start_node_ids. For ChunkBasedSearch.get_start_node_ids specifically, the call reads only query_bundle / vector_store / args.vss_* — it does not touch self.entity_contexts (which is what _init builds), so it has no data dependency on _init and can run concurrently with it. Override _retrieve in CompositeTraversalBasedRetriever to kick off a single-worker ThreadPoolExecutor that computes the chunk-VSS top-k via get_diverse_vss_elements before super()._retrieve(query_bundle) runs. Attach the resulting future onto each ChunkBasedSearch instance in _get_search_results_for_query. ChunkBasedSearch.get_start_node_ids pops the attribute and consumes the future via .result() if present, otherwise falls back to the existing inline VSS call. Guards: - Skip prefetch when args.derive_subqueries is True (subqueries carry different query_bundles, the prefetch was built from the original). - Skip when no ChunkBasedSearch is in the configured retriever list. - Consume-and-clear via __dict__.pop so a reused instance can't pick up a stale future on the next call. - add_done_callback logs at debug level if the prefetch raises AND _init raises first (would otherwise be swallowed). Validated against production Neptune Serverless + AOSS (toolkit v3.18.3, pool_maxsize=32), 12 representative queries, 2 warmup + 10 timed samples interleaved OLD vs NEW: - Correctness: start_node_ids set-equal across all 12 queries. - Perf: paired median delta -85 ms, paired mean -95 ms, 10/12 queries improved. Worst case +37 ms (within query-intrinsic variance).

voidwisp · 2026-04-24T17:34:38Z

Heads up — this is a work-in-progress draft, not ready for merge. Posting for early directional feedback on the override-_retrieve + duck-typed attribute approach. Still need to dig into why the measured savings (-85 ms median) land at ~50% of the phase-1 ceiling (~168 ms) before calling it done.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221

perf(retrieval): prefetch ChunkBasedSearch start-node VSS call concurrently with _init#221
voidwisp wants to merge 1 commit intoawslabs:mainfrom
voidwisp:perf/prefetch-chunk-start-node-ids

voidwisp commented Apr 24, 2026

Uh oh!

voidwisp commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

voidwisp commented Apr 24, 2026

Summary

The change

Correctness

Measured impact

Backwards compatibility

Test plan

Draft

Uh oh!

voidwisp commented Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant