Skip to content

Allow standard drivers to function for dev type-PF#6

Merged
csibbitt merged 1 commit intorhos-vaf:mainfrom
bogdando:flavor
Oct 17, 2025
Merged

Allow standard drivers to function for dev type-PF#6
csibbitt merged 1 commit intorhos-vaf:mainfrom
bogdando:flavor

Conversation

@bogdando
Copy link
Contributor

If the guest VM's driver detects it is running on a passthrough Physical Function of an SR-IOV capable card, it will require a licensed NVIDIA GRID driver to function, whereas you want to use the standard, non-licensed drivers.

This is a well-known behavior with NVIDIA GPUs.
Passing the device through as a type-PF correctly
describes the hardware to OpenStack, but it doesn't change the fact that the NVIDIA driver inside the guest VM can detect the underlying hypervisor and the nature of the device, and then enforce licensing restrictions.

We have confirmed in testing that we cannot hide the SR-IOV capability from the host OS. This means libvirt and Nova will always see it as a type-PF. The problem, therefore, is not in the OpenStack configuration but in how to prevent the guest driver from detecting that it's running on a virtualized PF.

This is a common challenge, and the solution usually involves masking the hypervisor's presence from the guest VM. In libvirt, which Nova uses, this can often be accomplished by configuring the VM's CPU model and hiding the KVM hypervisor signature.

Add flavor properties to make the virtual machine look as "bare-metal" as possible to the guest operating system.

This should prevent the NVIDIA driver from detecting the virtualized environment and enforcing the GRID license requirement, allowing your standard drivers to work.

If the guest VM's driver detects it is running on a passthrough
Physical Function of an SR-IOV capable card, it will require a licensed
NVIDIA GRID driver to function, whereas you want to use the standard,
non-licensed drivers.

This is a well-known behavior with NVIDIA GPUs.
Passing the device through as a type-PF correctly
describes the hardware to OpenStack, but it doesn't change the
fact that the NVIDIA driver inside the guest VM can detect the
underlying hypervisor and the nature of the device, and then
enforce licensing restrictions.

We have confirmed in testing that we cannot hide the SR-IOV
capability from the host OS. This means libvirt and Nova will
always see it as a type-PF. The problem, therefore, is not in
the OpenStack configuration but in how to prevent the guest
driver from detecting that it's running on a virtualized PF.

This is a common challenge, and the solution usually involves
masking the hypervisor's presence from the guest VM.
In libvirt, which Nova uses, this can often be accomplished by
configuring the VM's CPU model and hiding the KVM hypervisor signature.

Add flavor properties to make the virtual machine look as
"bare-metal" as possible to the guest operating system.

This should prevent the NVIDIA driver from detecting the
virtualized environment and enforcing the GRID license
requirement, allowing your standard drivers to work.

Signed-off-by: Bohdan Dobrelia <bdobreli@redhat.com>
@bogdando bogdando requested a review from csibbitt October 17, 2025 12:25
@csibbitt csibbitt merged commit ce2f84b into rhos-vaf:main Oct 17, 2025
1 check passed
@csibbitt
Copy link
Contributor

The PR comment confuses me, to be honest. This definitely isn't an NVIDIA licensing problem because even the nouveau driver fails to load.

[    8.851548] nouveau 0000:05:00.0: NVIDIA AD104 (194000a1)
[    8.853098] nouveau 0000:05:00.0: gsp ctor failed: -2
[    8.853112] nouveau: probe of 0000:05:00.0 failed with error -2

I don't know why this would have an effect on the nouveau driver, but I could buy that maybe it would for the nvidia driver? I merged this because you said it was tested and works; but some written evidence of that fact on the PR would be appreciated in the future.

@bogdando
Copy link
Contributor Author

bogdando commented Oct 21, 2025

nouveau

I took this explanation from your post:

Unfortunately, it fails to load the nvidia driver, something I've never seen before once the passthrough is set up. I'm highly suspecting the type-PF vs type-PCI since that's the only difference I know vs. my working setup. I think that type-PF involves the SRIOV layer and is possibly not compatible with the non-grid version of the nvidia drivers.
Even the nouveau driver failed to load inside the VM, suggesting something is definitely wrong with the device.

This commit message does not mention nouveau indeed. Let's firstly retest with this change, even though these properties look incorrect to use, according to inputs from Sean. Then we can revert this, as we confirm that the original problem (nouveau) still cannot load for VM instance, neither can nvidia standard driver to function for it. @csibbitt

@bogdando bogdando deleted the flavor branch November 4, 2025 10:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants