Skip to content

Support GPU Passthrough to VMs #106

@MalteJ

Description

@MalteJ

Summary

This work item enables FeOS to attach one or more physical GPUs directly to a FeOS-managed Virtual Machine (VM) using PCIe passthrough.

This functionality is critical for supporting GPU-accelerated workloads such as Artificial Intelligence (AI), Machine Learning (ML), scientific computing, and high-performance graphics within VMs. The implementation will extend the VM API to allow specifying GPUs by their host PCIe address.


Scope

✅ In Scope

  • Extend the FeOS VM API to allow specifying one or more GPUs via their host PCIe address for attachment to a VM.
  • Implement the backend logic for PCIe passthrough of a complete physical GPU (e.g., using IOMMU / vfio-pci).
  • Ensure the guest VM can recognize the attached GPU and that appropriate vendor drivers (e.g., NVIDIA, AMD) can be installed and utilized.
  • Support for passing through multiple GPUs to a single VM.

❌ Out of Scope

  • GPU virtualization technologies like NVIDIA vGPU or AMD MxGPU (SR-IOV). This issue focuses exclusively on full device passthrough.
  • Live migration of VMs with attached GPUs.
  • Dynamic hot-plugging of GPUs. GPUs must be attached when the VM is created or started.
  • Host-side GPU driver installation and configuration. This issue assumes the host is correctly prepared for passthrough.

Responsible Areas

  • FeOS VM Management
  • FeOS API

Contributors


Acceptance Criteria

  • API

    • The VM API is extended to accept a list of PCIe addresses for GPUs in the VM specification.
    • The API performs validation to ensure the specified PCIe devices exist and are available for passthrough.
  • VM Runtime & Guest OS

    • A VM can be successfully launched with one or more GPUs passed through to it.
    • The guest operating system correctly identifies the hardware of the passed-through GPU(s) (e.g., visible in lspci).
    • Vendor-specific drivers (e.g., NVIDIA driver) can be installed successfully inside the guest OS.
    • A GPU-accelerated application or utility (e.g., nvidia-smi, a CUDA/OpenCL sample) runs successfully within the VM and can access the GPU's capabilities.
    • The FeOS host correctly isolates the device, preventing host-level drivers from claiming it while it is assigned to a VM.

Action Items

  • Design the API extension in the VM model for specifying GPU devices.
  • Implement the backend logic to configure the hypervisor for GPU passthrough (e.g., managing IOMMU groups, binding to vfio-pci).
  • Ensure that all functions of a GPU (e.g., graphics and audio components on the same PCIe card) are passed through together.
  • Add robust validation and error handling for cases where a GPU is unavailable or passthrough fails.
  • Create integration tests that:
    • Launch a VM with a single GPU and verify its functionality in the guest.
    • Launch a VM with multiple GPUs and verify their functionality in the guest.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions