-
Notifications
You must be signed in to change notification settings - Fork 8
Add a rapids doctor check to verify CUDA toolkit libraries are findable and are version-consistent #139
Copy link
Copy link
Open
Description
RAPIDS libraries can often install successfully via pip or conda but fail at runtime due to the underlying CUDA toolkit not being setup properly. Some scenarios which I have personally come across of this:
- shared libraries (like
libcudart.soorlibnvrtc.so) are not findable at runtime. Pip wheels previously did not provide these (although with cupy 14, things looks much better). And any setup with a preinstalled CUDA toolkit could have an incorrect configuration. - CUDA toolkit version (either from pip/conda installations or
CUDA_HOMEand/usr/local/cudasymlink resolution) does not match with GPU driver's CUDA version. - The scenario above is further exacerbated by libraries hardcoding
/usr/local/cudaas a fallback search path, so a stale symlink loads wrong libraries.
A check on rapids-cli can be added which checks for
- discoverability of shared libraries via
cuda-pathfinder - version consistency between these libraries found the
cuda-pathfinder, the GPU driver, the/usr/local/cudasymlink and theCUDA_HOME/CUDA_PATHenvironment variables (if present). Mismatch on major versions is an automatic error but I am curious about what is the recommended approach if there is a mismatch for a minor version. Is it warning or should that be an error?
I think having this check fills in a very important gap in existing rapids doctor checks, and builds upon information which can be gathered by rapids debug.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels