Skip to content

ScaNN: Fix AVQ prefetch#1899

Merged
rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
rmaschal:fix-avq-prefetch
Apr 2, 2026
Merged

ScaNN: Fix AVQ prefetch#1899
rapids-bot[bot] merged 2 commits intorapidsai:release/26.04from
rmaschal:fix-avq-prefetch

Conversation

@rmaschal
Copy link
Copy Markdown
Contributor

@rmaschal rmaschal commented Mar 9, 2026

Switching to usage of modern RAFT in cuVS (#1837) introduced a bug where the prefetched gather for AVQ is performed using the stream associated with raft::device_resources rather than the provided stream for copying. This led to two issues:

  1. Elimination of the benefit for prefetching, as copies where scheduled on the same stream as other gpu work
  2. Possible recall loss. Synchronization was still performed against the copy stream, potentially allowing host to proceed before the prefetch copy is complete.

This PR sets the stream associated with the resource to the copy stream before prefetching, and back when done.

@rmaschal rmaschal requested a review from a team as a code owner March 9, 2026 20:18
@rmaschal rmaschal added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Mar 9, 2026
@rmaschal rmaschal self-assigned this Mar 9, 2026
Copy link
Copy Markdown
Member

@aamijar aamijar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch @rmaschal, and LGTM!

@aamijar aamijar moved this from Todo to Done in Unstructured Data Processing Mar 9, 2026
@aamijar aamijar moved this from Done to In Progress in Unstructured Data Processing Mar 9, 2026
Comment on lines +514 to +541
@@ -532,6 +537,8 @@ class cluster_loader {
raft::copy(res, cluster_vectors, raft::make_const_mdspan(pinned_cluster));
raft::resource::sync_stream(res, stream_);

// reset stream back to previous value
raft::resource::set_cuda_stream(res, stream);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create a new resources object for no cost:

Suggested change
// For prefetching to overlap with other gpu work
// we need to schedule copies on the provided copy stream stream_
auto copy_res = raft::resources(stream_);
// htod
auto h_cluster_ids =
raft::make_pinned_vector_view<LabelT, int64_t>(cluster_ids_buf_.data_handle(), size);
@@ -532,6 +537,8 @@ class cluster_loader {
raft::copy(copy_res, cluster_vectors, raft::make_const_mdspan(pinned_cluster));
raft::resource::sync_stream(res, stream_);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I vote on this too. Although not very relevant here, another benefit of creating a temporary resources handle is the automatic reset of the changes if an exception is raised. However, I'd like to warn that there are caveats in doing this:

  1. I'd suggest to use a copy the resources handle rather than creating a new resources handle to inherit all other initialized resources from the current handle.
auto copy_res = raft::resources(res);
raft::resource::set_cuda_stream(res, stream);
  1. Even when you do a copy: if you've used any other not-yet-initialized resources from the copy_res handle, it would create those resources on every invocation of prefetch_cluster, which would incur significant overhead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the comments and info. I now copy the passed resources, set the stream, and use that for scheduling copies

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Mar 18, 2026

Hi @rmaschal, is this targeting release/26.04? If so, please retarget from main to release/26.04

@rmaschal
Copy link
Copy Markdown
Contributor Author

Hi @rmaschal, is this targeting release/26.04? If so, please retarget from main to release/26.04

Hi @aamijar, just saw the comment, I assume it's too late for 26.04? In that case main is fine, then targeting 26.06

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Apr 1, 2026

Hi @rmaschal, looks like we are good to merge this right, or are you seeing CI failures related this PR? No need to keep merging in main, as long as the Recently Updated check is passing. You can also rerun failed jobs if they are flaky.

@rmaschal
Copy link
Copy Markdown
Contributor Author

rmaschal commented Apr 1, 2026

@aamijar yes should be good, some python CI failures but they should be unrelated.

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Apr 1, 2026

Great! I will try to rebase to release/26.04. Would be nice to have in the release since it a bug fix/ perf regression fix.

@aamijar aamijar force-pushed the fix-avq-prefetch branch from d82cfe8 to d5d3316 Compare April 1, 2026 22:34
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 1, 2026

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@aamijar aamijar changed the base branch from main to release/26.04 April 1, 2026 22:35
@aamijar
Copy link
Copy Markdown
Member

aamijar commented Apr 1, 2026

/ok to test d5d3316

@aamijar
Copy link
Copy Markdown
Member

aamijar commented Apr 2, 2026

/merge

@rapids-bot rapids-bot bot merged commit 44006ee into rapidsai:release/26.04 Apr 2, 2026
80 checks passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Apr 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

improvement Improves an existing functionality non-breaking Introduces a non-breaking change

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants