ScaNN: Fix AVQ prefetch by rmaschal · Pull Request #1899 · rapidsai/cuvs

rmaschal · 2026-03-09T20:18:57Z

Switching to usage of modern RAFT in cuVS (#1837) introduced a bug where the prefetched gather for AVQ is performed using the stream associated with raft::device_resources rather than the provided stream for copying. This led to two issues:

Elimination of the benefit for prefetching, as copies where scheduled on the same stream as other gpu work
Possible recall loss. Synchronization was still performed against the copy stream, potentially allowing host to proceed before the prefetch copy is complete.

This PR sets the stream associated with the resource to the copy stream before prefetching, and back when done.

aamijar

Good catch @rmaschal, and LGTM!

divyegala · 2026-03-09T22:52:58Z

cpp/src/neighbors/scann/detail/scann_avq.cuh

@@ -532,6 +537,8 @@ class cluster_loader {
      raft::copy(res, cluster_vectors, raft::make_const_mdspan(pinned_cluster));
      raft::resource::sync_stream(res, stream_);

+      // reset stream back to previous value
+      raft::resource::set_cuda_stream(res, stream);


You can create a new resources object for no cost:

Suggested change

// For prefetching to overlap with other gpu work

// we need to schedule copies on the provided copy stream stream_

auto copy_res = raft::resources(stream_);

// htod

auto h_cluster_ids =

raft::make_pinned_vector_view<LabelT, int64_t>(cluster_ids_buf_.data_handle(), size);

@@ -532,6 +537,8 @@ class cluster_loader {

raft::copy(copy_res, cluster_vectors, raft::make_const_mdspan(pinned_cluster));

raft::resource::sync_stream(res, stream_);

I vote on this too. Although not very relevant here, another benefit of creating a temporary resources handle is the automatic reset of the changes if an exception is raised. However, I'd like to warn that there are caveats in doing this:

I'd suggest to use a copy the resources handle rather than creating a new resources handle to inherit all other initialized resources from the current handle.

auto copy_res = raft::resources(res); raft::resource::set_cuda_stream(res, stream);

Even when you do a copy: if you've used any other not-yet-initialized resources from the copy_res handle, it would create those resources on every invocation of prefetch_cluster, which would incur significant overhead.

Thank you for the comments and info. I now copy the passed resources, set the stream, and use that for scheduling copies

aamijar · 2026-03-18T22:06:56Z

Hi @rmaschal, is this targeting release/26.04? If so, please retarget from main to release/26.04

rmaschal · 2026-03-23T18:32:19Z

Hi @rmaschal, is this targeting release/26.04? If so, please retarget from main to release/26.04

Hi @aamijar, just saw the comment, I assume it's too late for 26.04? In that case main is fine, then targeting 26.06

aamijar · 2026-04-01T21:18:04Z

Hi @rmaschal, looks like we are good to merge this right, or are you seeing CI failures related this PR? No need to keep merging in main, as long as the Recently Updated check is passing. You can also rerun failed jobs if they are flaky.

rmaschal · 2026-04-01T21:22:30Z

@aamijar yes should be good, some python CI failures but they should be unrelated.

aamijar · 2026-04-01T21:24:21Z

Great! I will try to rebase to release/26.04. Would be nice to have in the release since it a bug fix/ perf regression fix.

copy-pr-bot · 2026-04-01T22:34:50Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

aamijar · 2026-04-01T22:35:36Z

/ok to test d5d3316

aamijar · 2026-04-02T01:47:50Z

/merge

rmaschal requested a review from a team as a code owner March 9, 2026 20:18

github-project-automation bot added this to Unstructured Data Processing Mar 9, 2026

rmaschal added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 9, 2026

rmaschal self-assigned this Mar 9, 2026

aamijar approved these changes Mar 9, 2026

View reviewed changes

aamijar moved this to Todo in Unstructured Data Processing Mar 9, 2026

aamijar moved this from Todo to Done in Unstructured Data Processing Mar 9, 2026

aamijar moved this from Done to In Progress in Unstructured Data Processing Mar 9, 2026

divyegala requested changes Mar 9, 2026

View reviewed changes

rmaschal force-pushed the fix-avq-prefetch branch from 1c03c6f to 22f74e7 Compare March 10, 2026 17:26

divyegala approved these changes Mar 10, 2026

View reviewed changes

rmaschal added 2 commits April 1, 2026 22:33

ScaNN: Fix AVQ prefetch

d2a030c

Copy resources instead of setting stream

d5d3316

aamijar force-pushed the fix-avq-prefetch branch from d82cfe8 to d5d3316 Compare April 1, 2026 22:34

aamijar changed the base branch from main to release/26.04 April 1, 2026 22:35

rapids-bot bot merged commit 44006ee into rapidsai:release/26.04 Apr 2, 2026
80 checks passed

github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Apr 2, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ScaNN: Fix AVQ prefetch#1899

ScaNN: Fix AVQ prefetch#1899
rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
rmaschal:fix-avq-prefetch

rmaschal commented Mar 9, 2026

Uh oh!

aamijar left a comment

Uh oh!

divyegala Mar 9, 2026

Uh oh!

achirkin Mar 10, 2026

Uh oh!

rmaschal Mar 10, 2026

Uh oh!

aamijar commented Mar 18, 2026

Uh oh!

rmaschal commented Mar 23, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

rmaschal commented Apr 1, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

copy-pr-bot bot commented Apr 1, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

aamijar commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

+      // For prefetching to overlap with other gpu work
+      // we need to schedule copies on the provided copy stream stream_
+      auto copy_res = raft::resources(stream_);
+// htod
+auto h_cluster_ids =
+raft::make_pinned_vector_view<LabelT, int64_t>(cluster_ids_buf_.data_handle(), size);
+@@ -532,6 +537,8 @@ class cluster_loader {
+raft::copy(copy_res, cluster_vectors, raft::make_const_mdspan(pinned_cluster));
+raft::resource::sync_stream(res, stream_);

Conversation

rmaschal commented Mar 9, 2026

Uh oh!

aamijar left a comment

Choose a reason for hiding this comment

Uh oh!

divyegala Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

achirkin Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

rmaschal Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

aamijar commented Mar 18, 2026

Uh oh!

rmaschal commented Mar 23, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

rmaschal commented Apr 1, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

copy-pr-bot bot commented Apr 1, 2026

Uh oh!

aamijar commented Apr 1, 2026

Uh oh!

aamijar commented Apr 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants