Skip to content

Incorrect NUMA Node and CPU Pinning During VM Migration #6772

@feldsam

Description

@feldsam

/!\ To report a security issue please follow this procedure:
[https://github.com/OpenNebula/one/wiki/Vulnerability-Management-Process]

Description
The current implementation for Huge Pages support, as per the enhancement "Support use of huge pages without CPU pinning #6185," selects a NUMA node based on free resources. The scheduling mechanism effectively balances load across NUMA nodes. However, issues arise during VM migration, leading to inconsistencies.

To Reproduce

  1. Configure a VM to use Huge Pages and deploy it on a host.
  2. Initiate a migration using the standard SAVE/Restore or Live migration method.
  3. Observe that the VM continues to use the old NUMA node on the target host, even if the scheduler selects a different NUMA node based on the target host’s free resources.
  4. If there is insufficient memory in the old NUMA node on the target, the migration may fail.
  5. Deploy new VMs and note inconsistencies caused by incorrectly pinned VMs.

Expected behavior

  • When a VM is migrated using SAVE/Restore or Live migration methods, the NUMA node assignments should be updated based on the scheduler's decision.
  • The migration should update the VM’s configuration with the correct NUMA assignments, avoiding failures and maintaining scheduling consistency.

Details

  • Affected Component: Scheduler, Virtual Machine Manager (VMM)
  • Hypervisor: KVM
  • Version: All

Additional context

  • During SAVE and Live migration operations, we can use the --xml option to provide a new XML configuration file with the updated NUMA topology and CPU pinning information. This ensures that the VM's NUMA node and CPU assignments are correctly updated on the target host.

Progress Status

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions