-
Notifications
You must be signed in to change notification settings - Fork 67
Filter out VM Prime PVCs from protected PVC list by unsetting VM label selector on prime PVC #2362
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
a8b6a5a to
b74e4ea
Compare
| - list | ||
| - watch | ||
| - patch | ||
| - update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned here: #2368 (comment), maybe better to keep patch and add update?
config/rbac/role.yaml
Outdated
| - get | ||
| - list | ||
| - patch | ||
| - update |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as previous
| return out, nil | ||
| } | ||
|
|
||
| func UpdateVMRecipePvcLabel(ctx context.Context, c client.Client, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually use upper case for PVC in function names
| } | ||
|
|
||
| // If a temporary importer prime PVC related to VM is found, it attempts to update its label and skips it from the result. | ||
| func filterPVCs( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From the function name I'd expect this function to only filter PVCs, but actually it is doing more, than just filtering - it is also mutates some of PVCs (or at least tries to do it). Could we extract the mutation logic to a separate function and invoke it after the ListPVCsByPVCSelector returns the list?
Signed-off-by: pruthvitd <prd@redhat.com>
b74e4ea to
a90307c
Compare
| // - The PVC was created via CDI VolumeImportSource annotations, not DataSourceRef or DataSource. | ||
| // - The PVC// - The PVC includes CDI import annotations and is owned by another PVC, indicating it is part of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this part could use cleanup
| delete(pvc.Labels, core.VMLabelSelector) | ||
| delete(pvc.Labels, LabelOwnerNamespaceName) | ||
| delete(pvc.Labels, LabelOwnerName) | ||
| delete(pvc.Labels, ConsistencyGroupLabel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This assumes all these labels are always present together and should be deleted together. But what if only one of those exists? Couldn't this inadvertently remove legitimate labels from non-temporary PVCs for some cases?
| log.Error(err, "Failed to unset VM recipe PVC label selector", | ||
| "recipe", recipeName, "labelKey", core.VMLabelSelector) | ||
|
|
||
| return nil, fmt.Errorf( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some PVCs might already be modified (labels deleted) before the function returns the error. This leaves the cluster in an inconsistent state - partially cleaned up PVCs. Possible risk IMO.
| strings.HasPrefix(pvcName, "prime") && | ||
| !strings.Contains(pvcName, "scratch") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Relying on PVC name patterns here, but what if a user manually creates a PVC named "prime-something"? There's no validation that this is actually a CDI created PVC beyond the annotations.
| log.Info("Unsetting vm label selector from pvc", pvc.Name, | ||
| core.VMLabelSelector, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| log.Info("Unsetting vm label selector from pvc", pvc.Name, | |
| core.VMLabelSelector, | |
| log.Info("Unsetting vm label selector from pvc", "pvc", pvc.Name, "labelKey", core.VMLabelSelector) |
| // +kubebuilder:rbac:groups="apiextensions.k8s.io",resources=customresourcedefinitions,verbs=get;list;watch | ||
| // +kubebuilder:rbac:groups=core,resources=pods/exec,verbs=create | ||
| // +kubebuilder:rbac:groups="kubevirt.io",resources=virtualmachines,verbs=get;list;watch;patch;delete | ||
| // +kubebuilder:rbac:groups="kubevirt.io",resources=virtualmachines,verbs=get;list;watch;update;delete |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's best to keep both verbs. We wouldn't want to break existing code and there's zero downside to having them both included.
What caused the problem
VM Prime PVCs were being incorrectly included in the protected PVC list because the default Ramen VM label selector was still present on these temporary PVCs.
The Ramen VRG reconciler uses label selectors to identify resources (VMs and PVCs) for protection. After a VM is enrolled for DR:
Upstream: This label selector is added on the VM and its resources by the user.
Downstream: The default label is added to the VirtualMachine CR by the ODF UI console to mark it as protected.
This mechanism allows Ramen to pseudo-own resources and manage subsets of VMs in a namespace.
However, during restore or relocation workflows, temporary PVCs (Prime PVCs) created for VolumeImportSource/DataVolume inherit this label. If PV adoption succeeds but the Prime PVC retains Ramen finalizers and the VM label selector, it is not garbage-collected and gets treated as a protected resource.
This leads to:
How it’s fixed
Impact
Fixes: 2353
Updated RBAC permissions on VirtualMachine CRD to use the 'update' verb instead of 'patch' for ownerReference modifications.