-
Notifications
You must be signed in to change notification settings - Fork 24
Resolve a tegra platform for a single iGPU #78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This is required for the change to NVIDIA/nvidia-container-toolkit#1461 |
|
The change itself seems straightforward. There is a change in semantics though -- now it checks if any device is an iGPU rather than ensuring that all devices are iGPUs. So long as that it the intent, then LGTM. |
Yes. This is the intent. There is a risk that we may end up detecting the wrong platform on older Tegra-based systems where dGPUs are present, but I think this is relatively low. As a follow-up thought / quesiton: What are your thoughts on having some kind of "override" for this detection logic that is implmented at this level so as to be useful to all upstream consumers? Something like an |
|
Before I merge this, one thought that I had is to check whether the What are your thoughts on this? |
This seems reasonable to me. If someone knows what they are doing, they can make sure their platform drops a file in here to inform the toolkit which path to take. What values would you envision this honoring to start with? |
I don't know enough about the difference between how the driver shows up on one system vs. the other to have a strong opinion. On the surface, this "feels" very brittle though. |
Yes, it is brittle. It's also based on heuristics and not on a stable API. I think having an "override" functionality as discussed in the other comment would give us more flexibility without making the logic here more complex. |
My initial thought would be that we have a file: We would ignore lines starting with I can start a follow-up PR where we can discuss this in more detail. |
Works for me. |
pkg/nvlib/info/property-extractor.go
Outdated
| } | ||
|
|
||
| // HasOnlyIntegratedGPUs checks whether all GPUs are iGPUs that use NVML. | ||
| // HasAnIntegratedGPU checks whether any GPU is an iGPUs that use NVML. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does "use NVML" mean here?
I ask because as per the name, I'd expect "checks whether any GPU is an integrated GPU".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment as written is a little ambiguous. I have reworked it to (hopefully) be clearer.
The point is not that the GPUs "use NVML", but that NVML is present on the system and reports the integrated GPUs (e.g. in the nvidia-smi -L output). This wasn't always the case and changed with the release of Orin-based systems.
pkg/nvlib/info/property-extractor.go
Outdated
| if !isIntegratedGPUName(name) { | ||
| return false, fmt.Sprintf("device %q does not use nvgpu module", name) | ||
| if !IsIntegratedGPUName(name) { | ||
| continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to 🤷 :
if IsIntegratedGPUName(name) {
return ...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. I was still in the "I need to check all devices" mindset. The quick return makes the code simpler to reason about.
02567ae to
f2bc059
Compare
Instead of requiring that all GPUs on an Tegra platform with NVML be iGPUs, we resolve it as a tegra platform if at least one device is an iGPU. Signed-off-by: Evan Lezar <elezar@nvidia.com>
f2bc059 to
eda6327
Compare
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Created #79 to continue the discussion on this. Went ahead and merged this PR. |
Instead of requiring that all GPUs on an Tegra platform with NVML be iGPUs, we resolve it as a tegra platform if at least one device is an iGPU.
Note that this changes the semantics of platform resolution and may cause issues on Tegra-based systems with dGPUs installed IF the iGPU is also enumeratable by NVML.