Always build with JIT+LTO by KyleFromNVIDIA · Pull Request #1923 · rapidsai/cuvs

KyleFromNVIDIA · 2026-03-16T20:16:23Z

Since #1909, we've been able to use older versions of the CUDA driver, since we no longer rely on cudaLibraryEnumerateKernels(). Since #1918, we've been using static cudart, which allows us to run on platforms with versions of CUDA older than 12.8 installed, since the runtime library API is now bundled with cuvs. Always build with JIT+LTO so that we can get the full compile time and binary size benefits in CUDA 12 too.

Since rapidsai#1909, we've been able to use older versions of the CUDA driver, since we no longer rely on `cudaLibraryEnumerateKernels()`. Since rapidsai#1918, we've been using static cudart, which allows us to run on platforms with versions of CUDA older than 12.8 installed, since the runtime library API is now bundled with cuvs. Always build with JIT+LTO so that we can get the full compile time and binary size benefits in CUDA 12 too.

cpp/CMakeLists.txt

KyleFromNVIDIA · 2026-03-16T21:50:11Z

It seems that even after #1918, we're still not using cudart_static, since rmm forces us to use the shared version:

./../../..//bin/gtests/libcuvs/STATS_TEST: symbol lookup error: /opt/conda/envs/test/bin/gtests/libcuvs/../../../lib/libcuvs.so: undefined symbol: cudaLibraryGetKernel, version libcudart.so.12

We have to get rmm on cudart_static.

KyleFromNVIDIA · 2026-03-16T23:08:24Z

We've decided to switch to the driver API instead, since rmm is blocked on rapidsai/cudf#20814, which in turn is also blocked.

cpp/src/detail/jit_lto/AlgorithmPlanner.cpp

dependencies.yaml

cpp/CMakeLists.txt

This reverts commit e26519f.

This reverts commit 6c91f9d.

c/tests/CMakeLists.txt

KyleFromNVIDIA

Remove the debug statement when finished

cpp/src/detail/jit_lto/AlgorithmPlanner.cpp

conda/recipes/libcuvs/recipe.yaml

KyleFromNVIDIA · 2026-03-24T20:43:58Z

Blocked on #1936

gforsyth · 2026-03-24T21:27:53Z

Just merged #1936 in

KyleFromNVIDIA · 2026-03-24T22:24:03Z

/merge

Since rapidsai#1909, we've been able to use older versions of the CUDA driver, since we no longer rely on `cudaLibraryEnumerateKernels()`. Since rapidsai#1918, we've been using static cudart, which allows us to run on platforms with versions of CUDA older than 12.8 installed, since the runtime library API is now bundled with cuvs. Always build with JIT+LTO so that we can get the full compile time and binary size benefits in CUDA 12 too. Authors: - Kyle Edwards (https://github.com/KyleFromNVIDIA) - Bradley Dice (https://github.com/bdice) Approvers: - Divye Gala (https://github.com/divyegala) - Ben Frederickson (https://github.com/benfred) - Bradley Dice (https://github.com/bdice) URL: rapidsai#1923

@viclafargue

rapidsai/cuvs#1923 switched cuVS to always enabling JIT+LTO, and therefore requiring nvJitLink at build and runtime. Previously, it had only been required for CUDA 13. cuGraph pulls cuVS via CPM and builds it from source, but wasn't updated to match, resulting in CUDA 12 builds here failing like this: ```text │ │ CMake Error at build/_deps/cuvs-src/cpp/CMakeLists.txt:754 (target_link_libraries): │ │ Target "cuvs" links to: │ │ CUDA::nvJitLink │ │ but the target was not found. Possible reasons include: │ │ * There is a typo in the target name. │ │ * A find_package call is missing for an IMPORTED target. │ │ * An ALIAS target is missing. │ │ CMake Error at build/_deps/cuvs-src/cpp/CMakeLists.txt:812 (target_link_libraries): │ │ Target "cuvs_static" links to: │ │ CUDA::nvJitLink │ │ but the target was not found. Possible reasons include: │ │ * There is a typo in the target name. │ │ * A find_package call is missing for an IMPORTED target. │ │ * An ALIAS target is missing. ``` ([example build link](https://github.com/rapidsai/cugraph/actions/runs/23767677532/job/69251605108?pr=5469#step:11:1108)) This resolves that, by removing "if CUDA 13" types of guards around this project's libnvJitLink dependency. ## Notes for Reviewers ### Some builds may still fail here Like this: ```text /__w/cugraph/cugraph/python/libcugraph/build/py3-none-linux_aarch64/_deps/cccl-src/lib/cmake/cub/../../../cub/cub/device/dispatch/kernels/kernel_scan_warpspeed.cuh(455): error: more than one instance of function template "cub::detail::scan::warpReduce" matches the argument list: function template "T raft::warpReduce(T, ReduceLambda)" (declared at line 49 of /pyenv/versions/3.14.3/lib/python3.14/site-packages/libraft/include/raft/util/reduction.cuh) function template "Tp cub::detail::scan::warpReduce(Tp, ScanOpT &)" (declared at line 247) argument types are: (unsigned int, raft::add_op) regWarpSum = warpReduce(regThreadSum, scan_op); ``` Until rapidsai/cuvs#1954 is resolved, which @viclafargue is working on in rapidsai/cuvs#1963 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Bradley Dice (https://github.com/bdice) - Chuck Hastings (https://github.com/ChuckHastings) URL: #5479

KyleFromNVIDIA requested review from a team as code owners March 16, 2026 20:16

KyleFromNVIDIA requested a review from msarahan March 16, 2026 20:16

KyleFromNVIDIA added breaking Introduces a breaking change improvement Improves an existing functionality labels Mar 16, 2026

github-project-automation bot added this to Unstructured Data Processing Mar 16, 2026

KyleFromNVIDIA changed the base branch from main to release/26.04 March 16, 2026 20:17

KyleFromNVIDIA added the DO NOT MERGE label Mar 16, 2026

KyleFromNVIDIA commented Mar 16, 2026

View reviewed changes

cpp/CMakeLists.txt Outdated Show resolved Hide resolved

divyegala approved these changes Mar 16, 2026

View reviewed changes

Use the driver API instead

6c91f9d

KyleFromNVIDIA requested a review from a team as a code owner March 16, 2026 23:26

divyegala reviewed Mar 16, 2026

View reviewed changes

cpp/src/detail/jit_lto/AlgorithmPlanner.cpp Outdated Show resolved Hide resolved

divyegala reviewed Mar 16, 2026

View reviewed changes

dependencies.yaml Show resolved Hide resolved

KyleFromNVIDIA added 2 commits March 16, 2026 23:33

Conda recipe

e858407

deps

1972a74

divyegala reviewed Mar 16, 2026

View reviewed changes

cpp/CMakeLists.txt Outdated Show resolved Hide resolved

KyleFromNVIDIA added 2 commits March 16, 2026 23:43

PRIVATE

4503307

auditwheel

a42ede0

KyleFromNVIDIA requested a review from a team as a code owner March 17, 2026 00:05

KyleFromNVIDIA added 7 commits March 17, 2026 02:43

Conda recipe

e26519f

Merge branch 'release/26.04' into jit-lto-cuda-12

697d1d0

Revert "Conda recipe"

3269055

This reverts commit e26519f.

COMPILE_ONLY

07c50e6

PUBLIC

788fd34

Revert "Use the driver API instead"

e16b88f

This reverts commit 6c91f9d.

Remove driver dep

96e9162

KyleFromNVIDIA added 4 commits March 19, 2026 11:26

Merge branch 'main' into jit-lto-cuda-12

38c9e9d

Push

17c5cd7

Merge branch 'main' into cudart-static

af0a04e

Merge branch 'cudart-static' into jit-lto-cuda-12

8c771a5

KyleFromNVIDIA requested review from a team as code owners March 23, 2026 13:54

KyleFromNVIDIA commented Mar 23, 2026

View reviewed changes

c/tests/CMakeLists.txt Show resolved Hide resolved

benfred approved these changes Mar 23, 2026

View reviewed changes

Debugging

b6560be

KyleFromNVIDIA commented Mar 23, 2026

View reviewed changes

cpp/src/detail/jit_lto/AlgorithmPlanner.cpp Outdated Show resolved Hide resolved

cpp/src/detail/jit_lto/AlgorithmPlanner.cpp Outdated Show resolved Hide resolved

KyleFromNVIDIA added 4 commits March 24, 2026 13:37

Downgrade to compute 7.0 for CUDA 12

84ddcf9

Merge branch 'main' into jit-lto-cuda-12

b08a35d

Remove JIT_LTO_COMPILATION variable

a8493a3

Remove CUVS_ENABLE_JIT_LTO preprocessor definition

997ab66

bdice reviewed Mar 24, 2026

View reviewed changes

conda/recipes/libcuvs/recipe.yaml Outdated Show resolved Hide resolved

Use libnvjitlink run exports

fe67525

KyleFromNVIDIA removed the DO NOT MERGE label Mar 24, 2026

bdice approved these changes Mar 24, 2026

View reviewed changes

KyleFromNVIDIA added the DO NOT MERGE label Mar 24, 2026

Merge branch 'main' into jit-lto-cuda-12

cb61d86

KyleFromNVIDIA removed the DO NOT MERGE label Mar 24, 2026

rapids-bot bot merged commit 4574fe3 into rapidsai:main Mar 25, 2026
223 of 228 checks passed

github-project-automation bot moved this from In Progress to Done in Unstructured Data Processing Mar 25, 2026

jameslamb mentioned this pull request Mar 30, 2026

depend on libnvjitlink-dev at build time unconditionally rapidsai/cugraph#5479

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Always build with JIT+LTO#1923

Always build with JIT+LTO#1923
rapids-bot[bot] merged 28 commits intorapidsai:mainfrom
KyleFromNVIDIA:jit-lto-cuda-12

KyleFromNVIDIA commented Mar 16, 2026

Uh oh!

Uh oh!

KyleFromNVIDIA commented Mar 16, 2026 •

edited

Loading

Uh oh!

KyleFromNVIDIA commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KyleFromNVIDIA left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KyleFromNVIDIA commented Mar 24, 2026

Uh oh!

gforsyth commented Mar 24, 2026

Uh oh!

KyleFromNVIDIA commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

KyleFromNVIDIA commented Mar 16, 2026

Uh oh!

Uh oh!

KyleFromNVIDIA commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

KyleFromNVIDIA commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KyleFromNVIDIA left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KyleFromNVIDIA commented Mar 24, 2026

Uh oh!

gforsyth commented Mar 24, 2026

Uh oh!

KyleFromNVIDIA commented Mar 24, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

KyleFromNVIDIA commented Mar 16, 2026 •

edited

Loading