enforce a floor on libnvjitlink, build wheels with CUDA 13.0.x, test wheels against mix of CTK versions#21671
Conversation
…wheels against mix of CTK versions
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
|
(edit) Ugh, meant to |
|
/ok to test |
conda/recipes/libcudf/recipe.yaml
Outdated
| - ${{ pin_compatible("cuda-version", upper_bound="x", lower_bound="x") }} | ||
| - cuda-nvrtc | ||
| - libnvjitlink >=${{ cuda_version }} | ||
| - ${{ pin_compatible("libnvjitlink", lower_bound="x.x.x", upper_bound="x") }} |
There was a problem hiding this comment.
I think pin_compatible only works correctly if you have that exact package in host. We could verify what rattler says in the build outputs, but I think you might need to list libnvjitlink in host (next to the existing -dev package). The dependency libnvjitlink-dev -> libnvjitlink isn't enough iirc.
There was a problem hiding this comment.
Alternatively, wouldn't this be equivalent to us removing the ignore_run_exports on libnvjitlink? The libnvjitlink-dev recipe has:
run_exports:
- {{ pin_subpackage("libnvjitlink", max_pin="x") }}
That might be the more correct fix to use here.
There was a problem hiding this comment.
we could verify what rattler says
This is resolving correctly to what we want:
│ │ Finalized run dependencies (libcudf-26.04.00a472-cuda13_260318_9e773c7c):
...
│ │ │ cuda-version ┆ >=13.1,<14 (RE of [build: cuda-nvcc_linux-64]) │
...
│ │ │ libnvjitlink ┆ >=13.2.51,<14.0a0 (RE of [host: libnvjitlink-dev]) │
...
I think pin_compatible() just cares about that package making it into the host environment, not whether or not it's explicitly listed in your recipe.
wouldn't this be equivalent to us removing the ignore_run_exports on libnvjitlink?
I think you're right, I was wondering about that. But saw that libnvjitlink-dev run exports were explicitly ignored here:
cudf/conda/recipes/libcudf/recipe.yaml
Line 157 in 03dc96a
And assumed there was some reason for that (like those run exports being wrong for older versions or something).
I'd be happy to be able to remove these and just trust the libnvjitlink run_exports. Let's try that.
There was a problem hiding this comment.
Pushed 5fc6e47 doing that, I'll check the rattler-build output before this is merged.
There was a problem hiding this comment.
Looks like the run_exports are doing what we want.
CUDA 12.9:
│ │ Finalized run dependencies (libcudf-26.04.00a474-cuda12_260318_0602540d):
│ │ ╭────────────────────┬────────────────────────────────────────────────────╮
│ │ │ Name ┆ Spec │
│ │ ╞════════════════════╪════════════════════════════════════════════════════╡
│ │ │ Run dependencies ┆ │
│ │ │ __glibc ┆ >=2.28,<3.0.a0 (RE of [build: sysroot_linux-64]) │
│ │ │ cuda-nvrtc ┆ >=12.9.86,<13.0a0 (RE of [host: cuda-nvrtc-dev]) │
│ │ │ cuda-nvtx ┆ >=12.9.79,<13.0a0 (RE of [host: cuda-nvtx-dev]) │
│ │ │ cuda-version ┆ >=12.9,<13 (RE of [build: cuda-nvcc_linux-64]) │
│ │ │ flatbuffers ┆ >=24.3.25,<24.3.26.0a0 (RE of [host: flatbuffers]) │
│ │ │ libcufile ┆ >=1.14.1.1,<2.0a0 (RE of [host: libcufile-dev]) │
│ │ │ libcurand ┆ >=10.3.10.19,<11.0a0 (RE of [host: libcurand-dev]) │
│ │ │ libgcc ┆ >=14 (RE of [build: gcc_linux-64]) │
│ │ │ ┆ >=14 (RE of [build: gxx_linux-64]) │
│ │ │ libnvcomp ┆ >=5.1.0.21,<6.0a0 (RE of [host: libnvcomp-dev]) │
│ │ │ libnvjitlink ┆ >=12.9.86,<13.0a0 (RE of [host: libnvjitlink-dev]) │
│ │ │ librdkafka ┆ >=2.13.2,<2.14.0a0 (RE of [host: librdkafka]) │
│ │ │ librmm ┆ >=26.4.0a62,<26.5.0a0 (RE of [host: librmm]) │
│ │ │ libstdcxx ┆ >=14 (RE of [build: gxx_linux-64]) │
│ │ │ libzlib ┆ >=1.3.1,<2.0a0 (RE of [host: zlib]) │
│ │ │ ┆ │
│ │ │ Run exports (Weak) ┆ │
│ │ │ libcudf ┆ >=26.4.0a474,<26.5.0a0 │
│ │ ╰────────────────────┴────────────────────────────────────────────────────╯
CUDA 13.1:
│ │ Finalized run dependencies (libcudf-26.04.00a474-cuda13_260318_0602540d):
│ │ ╭────────────────────┬────────────────────────────────────────────────────╮
│ │ │ Name ┆ Spec │
│ │ ╞════════════════════╪════════════════════════════════════════════════════╡
│ │ │ Run dependencies ┆ │
│ │ │ __glibc ┆ >=2.28,<3.0.a0 (RE of [build: sysroot_linux-64]) │
│ │ │ cuda-nvrtc ┆ >=13.1.115,<14.0a0 (RE of [host: cuda-nvrtc-dev]) │
│ │ │ cuda-nvtx ┆ >=13.1.115,<14.0a0 (RE of [host: cuda-nvtx-dev]) │
│ │ │ cuda-version ┆ >=13.1,<14 (RE of [build: cuda-nvcc_linux-64]) │
│ │ │ flatbuffers ┆ >=24.3.25,<24.3.26.0a0 (RE of [host: flatbuffers]) │
│ │ │ libcufile ┆ >=1.16.1.26,<2.0a0 (RE of [host: libcufile-dev]) │
│ │ │ libcurand ┆ >=10.4.1.81,<11.0a0 (RE of [host: libcurand-dev]) │
│ │ │ libgcc ┆ >=14 (RE of [build: gcc_linux-64]) │
│ │ │ ┆ >=14 (RE of [build: gxx_linux-64]) │
│ │ │ libnvcomp ┆ >=5.1.0.21,<6.0a0 (RE of [host: libnvcomp-dev]) │
│ │ │ libnvjitlink ┆ >=13.2.51,<14.0a0 (RE of [host: libnvjitlink-dev]) │
│ │ │ librdkafka ┆ >=2.13.2,<2.14.0a0 (RE of [host: librdkafka]) │
│ │ │ librmm ┆ >=26.4.0a62,<26.5.0a0 (RE of [host: librmm]) │
│ │ │ libstdcxx ┆ >=14 (RE of [build: gxx_linux-64]) │
│ │ │ libzlib ┆ >=1.3.1,<2.0a0 (RE of [host: zlib]) │
│ │ │ ┆ │
│ │ │ Run exports (Weak) ┆ │
│ │ │ libcudf ┆ >=26.4.0a474,<26.5.0a0 │
│ │ ╰────────────────────┴────────────────────────────────────────────────────╯
dependencies.yaml
Outdated
| - matrix: | ||
| packages: | ||
| - nvidia-nvjitlink>=13.0 | ||
| - nvidia-nvjitlink>=12,<14 |
There was a problem hiding this comment.
I find this a little weird. It's a fake pinning -- none of our packages actually use this -- we just have it to indicate roughly what the package needs. Last I remember, we settled on using CUDA 13 packages here. rapidsai/build-planning#68
There was a problem hiding this comment.
Sure, let's use CUDA 13 packages. I'll do that here and in the other PRs in this series.
| table: project.optional-dependencies | ||
| key: test | ||
| includes: | ||
| - cuda_version |
There was a problem hiding this comment.
Just picking a place to have a threaded conversation (I don't think this particular line is to blame)... I'm seeing cuML-related conda tests segfaulting:
Fatal Python error: Segmentation fault
...
=========================== short test summary info ============================
FAILED python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_cuml.py::test_clustering - worker 'gw4' crashed while running 'test_cuml.py::test_clustering'
FAILED python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_cuml.py::test_pipeline - worker 'gw3' crashed while running 'test_cuml.py::test_pipeline'
FAILED python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_cuml.py::test_linear_regression - worker 'gw0' crashed while running 'test_cuml.py::test_linear_regression'
FAILED python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_cuml.py::test_random_forest - worker 'gw2' crashed while running 'test_cuml.py::test_random_forest'
FAILED python/cudf/cudf_pandas_tests/third_party_integration_tests/tests/test_cuml.py::test_logistic_regression - worker 'gw1' crashed while running 'test_cuml.py::test_logistic_regression'
========================= 5 failed, 3 passed in 7.02s ==========================
There was a problem hiding this comment.
I see conda Python tests segfaulting in recent cuML CI too
e.g. on rapidsai/cuml#7907
https://github.com/rapidsai/cuml/actions/runs/23258577641/job/67624250836
and on rapidsai/cuml#7908
https://github.com/rapidsai/cuml/actions/runs/23249337850/job/67589124096
And in nvforest (which has some code that's also currently in cuML) as well, e.g. on rapidsai/nvforest#88
https://github.com/rapidsai/nvforest/actions/runs/23229511499/job/67622239496
|
Merging despite the failing (non-required) |
|
/merge |
a9064fe
into
rapidsai:release/26.04
…wheels against mix of CTK versions (#5457) Fixes #5443 Contributes to rapidsai/build-planning#257 * builds CUDA 13 wheels with the 13.0 CTK * ensures wheels ship with a runtime dependency of `nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}` Contributes to rapidsai/build-planning#256 * updates wheel tests to cover a range of CTK versions (we previously, accidentally, were only testing the latest 12.x and 13.x) Other changes * ensures conda packages also take on floors of `libnvjitlink>={whatever-version-they-were-built-against}` ## Notes for Reviewers ### How I tested this This uses wheels from similar PRs from RAPIDS dependencies, at build and test time: * rapidsai/cudf#21671 * rapidsai/kvikio#942 * rapidsai/raft#2971 * rapidsai/rmm#2270 * rapidsai/ucxx#604 Authors: - James Lamb (https://github.com/jameslamb) Approvers: - Gil Forsyth (https://github.com/gforsyth) URL: #5457
Description
Contributes to rapidsai/build-planning#257
nvidia-nvjitlink>={whatever-minor-version-they-were-built-against}Contributes to rapidsai/build-planning#256
Other changes
libnvjitlink>={whatever-version-they-were-built-against}Notes for Reviewers
How I tested this
Used wheels from similar PRs from RAPIDS dependencies, at build and test time:
Checklist