Fix bug when using just-in-time CDI spec generation#762
Conversation
| } | ||
| }() | ||
|
|
||
| if l.nvsandboxutilslib != nil { |
There was a problem hiding this comment.
For the time being, I have duplicated the code for GetAllDeviceSpecs -- which is exercised more frequently -- and will clean the up more in a post-release follow-up.
There was a problem hiding this comment.
Would it be worth to create a func now that we have the same code in 2 funcs?
There was a problem hiding this comment.
I don't know whether a functon would work since we need to defer the calls to Shutdown(). It is something we could look into, but I don't want to make that change this close to the release.
There was a problem hiding this comment.
Agree, something to workshop for the next release. The patch does as described.
This change fixes a bug when using just-in-time CDI spec generation for the NVIDIA Container Runtime for specific devices (i.e. not 'all'). Instead of unconditionally using the default nvsandboxutils library -- leading to errors due to undefined symbols -- we check whether the library can be properly initialised before continuing. Signed-off-by: Evan Lezar <elezar@nvidia.com>
e6059d2 to
75376d3
Compare
|
|
||
| if l.nvsandboxutilslib != nil { | ||
| if r := l.nvsandboxutilslib.Init(l.driverRoot); r != nvsandboxutils.SUCCESS { | ||
| l.logger.Warningf("Failed to init nvsandboxutils: %v; ignoring", r) |
There was a problem hiding this comment.
Warning log on line 121 is lowercase and here is upper cased, is there a reason for the diff?
There was a problem hiding this comment.
No specific reason. I will update these for consistency once we refactor.
ArangoGutierrez
left a comment
There was a problem hiding this comment.
My questions are non blockers, looks good
|
Tested $ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=runtime.nvidia.com/gpu=0 ubuntu nvidia-smi -L
GPU 0: NVIDIA GH200 96GB HBM3 (UUID: GPU-0abb8bdb-b6c4-73fc-2b1d-a52db1c353e2)patch restores the previous (1.16.2) behavior |
| } | ||
| }() | ||
|
|
||
| if l.nvsandboxutilslib != nil { |
There was a problem hiding this comment.
Agree, something to workshop for the next release. The patch does as described.
This change fixes a bug when using just-in-time CDI spec generation for the NVIDIA Container Runtime for specific devices (i.e. not 'all').
Instead of unconditionally using the default nvsandboxutils library -- leading
to errors due to undefined symbols -- we check whether the library can be
properly initialised before continuing.
The functionality was added in #629
Before this change:
With this change: