Skip to content

Conversation

@pruthvitd
Copy link
Member

@pruthvitd pruthvitd commented Dec 15, 2025

What caused the problem

VM Prime PVCs were being incorrectly included in the protected PVC list because the default Ramen VM label selector was still present on these temporary PVCs.
The Ramen VRG reconciler uses label selectors to identify resources (VMs and PVCs) for protection. After a VM is enrolled for DR:

Upstream: This label selector is added on the VM and its resources by the user.
Downstream: The default label is added to the VirtualMachine CR by the ODF UI console to mark it as protected.

This mechanism allows Ramen to pseudo-own resources and manage subsets of VMs in a namespace.
However, during restore or relocation workflows, temporary PVCs (Prime PVCs) created for VolumeImportSource/DataVolume inherit this label. If PV adoption succeeds but the Prime PVC retains Ramen finalizers and the VM label selector, it is not garbage-collected and gets treated as a protected resource.
This leads to:

  • PVC duplication and VolumeReplication sprawl during multiple DR actions (relocate/failover).
  • Unnecessary storage consumption and bandwidth usage during sync.

How it’s fixed

  • Added logic to filter out VM Prime PVCs from the protected PVC list.
  • When a temporary importer PVC is detected, unset the VM label selector to ensure it is automatically dropped and not considered for protection.

Impact

  • Prevents accidental protection of temporary PVCs.
  • Reduces risk of restore failures and dangling resources.
  • Optimizes DR workflows by avoiding unnecessary storage and replication overhead.

Fixes: 2353
Updated RBAC permissions on VirtualMachine CRD to use the 'update' verb instead of 'patch' for ownerReference modifications.

@pruthvitd pruthvitd force-pushed the prime-pvc-fix branch 2 times, most recently from a8b6a5a to b74e4ea Compare December 16, 2025 15:09
- list
- watch
- patch
- update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As mentioned here: #2368 (comment), maybe better to keep patch and add update?

- get
- list
- patch
- update
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous

return out, nil
}

func UpdateVMRecipePvcLabel(ctx context.Context, c client.Client,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use upper case for PVC in function names

}

// If a temporary importer prime PVC related to VM is found, it attempts to update its label and skips it from the result.
func filterPVCs(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the function name I'd expect this function to only filter PVCs, but actually it is doing more, than just filtering - it is also mutates some of PVCs (or at least tries to do it). Could we extract the mutation logic to a separate function and invoke it after the ListPVCsByPVCSelector returns the list?

Signed-off-by: pruthvitd <prd@redhat.com>
Comment on lines +415 to +416
// - The PVC was created via CDI VolumeImportSource annotations, not DataSourceRef or DataSource.
// - The PVC// - The PVC includes CDI import annotations and is owned by another PVC, indicating it is part of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like this part could use cleanup

Comment on lines +505 to +508
delete(pvc.Labels, core.VMLabelSelector)
delete(pvc.Labels, LabelOwnerNamespaceName)
delete(pvc.Labels, LabelOwnerName)
delete(pvc.Labels, ConsistencyGroupLabel)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumes all these labels are always present together and should be deleted together. But what if only one of those exists? Couldn't this inadvertently remove legitimate labels from non-temporary PVCs for some cases?

log.Error(err, "Failed to unset VM recipe PVC label selector",
"recipe", recipeName, "labelKey", core.VMLabelSelector)

return nil, fmt.Errorf(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some PVCs might already be modified (labels deleted) before the function returns the error. This leaves the cluster in an inconsistent state - partially cleaned up PVCs. Possible risk IMO.

Comment on lines +422 to +423
strings.HasPrefix(pvcName, "prime") &&
!strings.Contains(pvcName, "scratch")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Relying on PVC name patterns here, but what if a user manually creates a PVC named "prime-something"? There's no validation that this is actually a CDI created PVC beyond the annotations.

Comment on lines +509 to +510
log.Info("Unsetting vm label selector from pvc", pvc.Name,
core.VMLabelSelector,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.Info("Unsetting vm label selector from pvc", pvc.Name,
core.VMLabelSelector,
log.Info("Unsetting vm label selector from pvc", "pvc", pvc.Name, "labelKey", core.VMLabelSelector)

// +kubebuilder:rbac:groups="apiextensions.k8s.io",resources=customresourcedefinitions,verbs=get;list;watch
// +kubebuilder:rbac:groups=core,resources=pods/exec,verbs=create
// +kubebuilder:rbac:groups="kubevirt.io",resources=virtualmachines,verbs=get;list;watch;patch;delete
// +kubebuilder:rbac:groups="kubevirt.io",resources=virtualmachines,verbs=get;list;watch;update;delete
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's best to keep both verbs. We wouldn't want to break existing code and there's zero downside to having them both included.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Protecting temporary PVCs that should not be protected

3 participants