Support Zone-Aware Topologies in the KubeVirt CSI Driver#124
Support Zone-Aware Topologies in the KubeVirt CSI Driver#124moadqassem wants to merge 11 commits intokubevirt:mainfrom
Conversation
|
Hi @moadqassem. Thanks for your PR. PRs from untrusted users cannot be marked as trusted with I understand the commands that are listed here. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
1de4f1d to
c37d1ae
Compare
|
/test all |
c37d1ae to
8812d3f
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
@awels I have fixed some tests files that were failing and update the PR to resolve few conflicts. However, there were some other issues which I believe are not relevant to my PR: And I also took a quick look into the testing process and In order to add an e2e k8s test to make sure that the zone and region are respected, I must change few things in the kubevirtci cluster provider(or add a new provider, haven't looked deep so not sure). This is where it needs to be adjusted to add labels that points the allowed topology: https://github.com/kubevirt/kubevirtci/blob/a291de27bb596074c79729ea6f88533555f523fd/cluster-up/cluster/ephemeral-provider-common.sh#L90 Let me know what do you think? |
|
/test all |
|
Sorry been very busy with other stuff, I will try to take a look at this soon. |
|
If you look at the testing, we do actually exclude a few tests from the k8s test suite. In particular the RWX filesystem one, since we don't support that. |
I see. Alright let me check this filtering criteria. |
|
https://github.com/kubevirt/csi-driver/blob/main/hack/run-k8s-e2e.sh#L117-L128 is what we use, I believe the last skip is what skips the RWX filesystem test. |
|
Okay so I think I know what happened here, we made a change in how we identify which VM the volume is hotplugged into. Before we used the VMI, but that proved to be problematic, now we look at the VM directly. I am not sure if that affects how you look up VMs, it likely does. |
…t requirements Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
1701709 to
989e1ea
Compare
|
PR looks good, one thing I forgot to ask about, did you enabled the topology tests in the k8s test suite. You can modify https://github.com/kubevirt/csi-driver/blob/main/hack/test-driver.yaml and set |
Signed-off-by: moadqassem <moad.qassem@gmail.com>
|
/test all |
Signed-off-by: moadqassem <moad.qassem@gmail.com>
Signed-off-by: moadqassem <moad.qassem@gmail.com>
|
/test all |
|
Looks like we are having an issue with the tests when we enable topology here. I still have not had a change to look at why. |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
Issues go stale after 90d of inactivity. If this issue is safe to close now please do so with /lifecycle stale |
|
/remove-lifecycle stale |
|
/test all |
|
My sincere apologies it took me this long to have time to really take a look at this. I was able to easily reproduce the test failures locally. It appears to be two issues: apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: kubevirt
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: csi.kubevirt.io
allowedTopologies:
- matchLabelExpressions:
- key: topology.kubernetes.io/zone
values:
- az-1
- key: topology.kubernetes.io/region
values:
- eu-central
allowVolumeExpansion: true
volumeBindingMode: WaitForFirstConsumer
parameters:
infraStorageClassName: rook-ceph-block-wffc
bus: scsiBasically the two changes are the infra storage class is pointing to a WFFC one, and the volumeBindingMode is set to WaitForFirstConsumer. topology.kubernetes.io/region: ""
topology.kubernetes.io/zone: ""This of course caused the nodes to not be considered during scheduling and it failed. Once I manually corrected the labels and the storage class I was able to properly create topology aware volumes in the tenant cluster without issues. So I don't think there is any issue with the code itself (as you concluded as well) it is just a matter of properly configuring the test cluster during test setup. |
|
@moadqassem: The following tests failed, say
DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
I created a branch here https://github.com/awels/kubevirt-csi/tree/topology_support with the changes I think are needed to complete this PR. I will also update one of the lanes to use the WFFC binding mode environment variable. If you could take the last commit on that branch and move it into your PR I would appreciate it. If you can't I will make a PR out of that branch and give you credit. |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
Okay I guess you no longer have time to look at this. I created another PR based on this one with the fixes to the build and tests so that it all passes. |
Oh really sorry, haven't seen your comment. No worries we can merge your PR as well 😉. |
|
No worries, it is mostly my fault I have been really busy and never got around to figuring out what was causing the CI to fail. I just want to make sure you get the credit for the PR that is all. |
I see. Totally appreciated. Btw I have been looking into disk expansion and snapshotting lately. Are those features actually working. |
|
yes they should be working if the infra cluster storage class supports it |
Hmm ok, even though I tried it out and the controller didn't acknowledge that an object was created. Anyway might be an issue in my configs then |
|
Yeah we have the tests enabled in the kubernetes csi test suite. Also my companies test suite also says it is working fine. So I suspect a configuration issue on your end. |
What this PR does / why we need it:
This pull request introduces support for zone-aware topology in the KubeVirt CSI driver. This enhancement allows the CSI driver to handle storage provisioning and attachment in multi-zone Kubernetes clusters. Integrates with Kubernetes topology-aware volume provisioners to ensure volumes are created in the same zone/region as the requesting node by:
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)format, will close the issue(s) when PR gets merged):Fixes #
Special notes for your reviewer:
Release note: