-
Notifications
You must be signed in to change notification settings - Fork 24
Add support for a platform override file #79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
This also includes the commits from #78 |
6ff48ea to
d7bc09a
Compare
|
Why do we need an override? Whats the use case? We have enough tooling to determine the platform automatically don't we? |
This change adds support for reading the detected platform (if set to `auto`) from a platform override file. This allows system administrators to explicitly select a detected platform for tooling such as the nvidia-container-toolkit, the k8s-device-plugin, and k8s-dra-driver-gpu. Signed-off-by: Evan Lezar <elezar@nvidia.com>
d7bc09a to
db3c350
Compare
We don't actually have enough information in certain cases. The primary example here is Tegra-based systems. Here we rely on heuristics that may become out of date and in those cases we want to be able to allow sys-admins to override this value so as to avoid requiring a release just to address this. |
| PlatformResolver: &platformResolver{ | ||
| logger: o.logger, | ||
| platform: o.platform, | ||
| platform: o.normalizePlatform(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
:)
ArangoGutierrez
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This Draft - LGTM
| // | ||
| // This function can be overridden for testing purposes. | ||
| var getPlaformOverride = func() (string, string) { | ||
| platformOverrideFile, err := os.Open("/etc/nvidia-container-toolkit/platform-override") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be pulled out to a constant.
Independently, it feels a bit weird to bake in a path referencing the nvidia-container-toolkit inside nvlib.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we instead just have New() take the override as a string and leave the logic for including this file / how to parse it to the toolkit (possibly including helpers in nvlib for how to do the parsing, but not actually doing any parsing by default)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or instead of a raw string -- it takes the path to the override file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, the path to file should come from the entity importing go-nvlib
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point of having this defined directly here (or at least the default) is that we want consistency across all our components that use this platform detection logic. These are currently:
- The NVIDIA Container Toolkit (for the
nvidia-container-runtimeandnvidia-ctk cdi generate) - The GPU Device Plugin
- GPU Feature Discovery
In the longer term, I could also see the DRA driver for GPUs also pulling this in if we wanted to support Tegra-based systems.
This change adds support for reading the detected
platform (if set to
auto) from a platform overridefile. This allows system administrators to explicitly
select a detected platform for tooling such as the
nvidia-container-toolkit, the k8s-device-plugin, and
k8s-dra-driver-gpu.
For example, by creating the following file:
A
tegraplatform will ALWAYS be detected.