The NVML does not return the correct strut for latest CUDA.#48
Open
fostiropoulos wants to merge 1 commit intogpuopenanalytics:masterfrom
Open
The NVML does not return the correct strut for latest CUDA.#48fostiropoulos wants to merge 1 commit intogpuopenanalytics:masterfrom
fostiropoulos wants to merge 1 commit intogpuopenanalytics:masterfrom
Conversation
Author
|
Having tried the same code in different machine with different GPUs the error seems to be related to the GPU model. The code that produces a bug is from RTX 2080 with nvlink, while the same code does not produce an error for V100. |
|
Thanks. This is due to an inadvertent ABI break in the 535 driver, which will fixed in the next patch release. |
Author
|
Thanks for clarifying. Should there be a check somewhere and an error raised for this particular version or documentation of it? |
|
@wence- Patch release for what? nvidia? cuda? Or can the pynvml package be updated to fix this? |
|
this may be the same problem as my issue here: #50 |
Author
|
@erikhuck how I solved it was to uninstall driver version 535 and install the preceding release until the newer version is released. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
For my current system
The processes running on cuda are read incorrectly. There is an additional size_t element when reading the strut that is not documented but when ignored causes the response to be malformed.
The following code illustrates the error:
Output:
Expected: