This issue was originally reported in datastax/cass-operator #412 by srteam2020
Description
We find that in a HA k8s cluster, cass-operator could mistakenly delete the PVC of the cassandra pod after cass-operator experiencing a restart. After diagnosis and inspection, we find it is caused by potential staleness in some of the apiserver.
More concretely, if cass-operator receives a stale update event from an apiserver saying "CassandraDatacenter has non-nil deletion timestamp", the controller will delete the PVCs returned by listPVCs, and listPVCs will list PVCs using the name(space) instead of the UID. Note that the stale event actually comes from the deletion of a previous CassandraDatacenter sharing the same name (but with a different UID) as the currently running one, so the PVC of the existing CassandraDatacenter will be listed and deleted mistakenly.
One potential approach to fix this is to label each PVC the UID of the CassandraDatacenter when creating them, and list the PVC using the UID of the CassandraDatacenter in listPVCs to ensure that we always delete the right PVC.
Reproduction
We list concrete reproduction steps in a HA cluster as below:
- Create a CassandraDatacenter
cdc. PVC will be created in the cluster.
- Delete
cdc. Apiserver1 will send the update events with a non-nil deletion timestamp to the controller and the controller will trigger deletePVCs() to delete related PVC. Meanwhile, apiserver2 is partitioned so its watch cache stops at the moment that cdc is tagged with a deletion timestamp.
- Create the CassandraDatacenter with the same name
cdc again. Now the CassandraDatacenter gets back with a different uid, so as its PVC. However, apiserver2 still holds the stale view that cdc has a non-nil deletion timestamp and is about to be deleted.
- The controller restarts after a node failure and talks to the stale apiserver2. Reading the stale update events from apiserver2 that
cdc has a deletion timestamp, the controller lists all the PVC belonging to the currently running cdc (as mentioned above) and deletes them all.
Fix
We are willing to help fix this bug by issuing a PR.
As mentioned above, the bug can be avoided by tagging each PVC with the UID of the CassandraDatacenter and listing PVC with UID. Each CassandraDatacenter will always have a different UID even with the same name. So in this case the PVCs belonging to the newly created CassandraDatacenter will not be deleted by the stale events of the old one.
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-60
This issue was originally reported in
datastax/cass-operator#412 by srteam2020Description
We find that in a HA k8s cluster, cass-operator could mistakenly delete the PVC of the cassandra pod after cass-operator experiencing a restart. After diagnosis and inspection, we find it is caused by potential staleness in some of the apiserver.
More concretely, if cass-operator receives a stale update event from an apiserver saying "CassandraDatacenter has non-nil deletion timestamp", the controller will delete the PVCs returned by
listPVCs, andlistPVCswill list PVCs using the name(space) instead of the UID. Note that the stale event actually comes from the deletion of a previous CassandraDatacenter sharing the same name (but with a different UID) as the currently running one, so the PVC of the existing CassandraDatacenter will be listed and deleted mistakenly.One potential approach to fix this is to label each PVC the UID of the CassandraDatacenter when creating them, and list the PVC using the UID of the CassandraDatacenter in
listPVCsto ensure that we always delete the right PVC.Reproduction
We list concrete reproduction steps in a HA cluster as below:
cdc. PVC will be created in the cluster.cdc. Apiserver1 will send the update events with a non-nil deletion timestamp to the controller and the controller will triggerdeletePVCs()to delete related PVC. Meanwhile, apiserver2 is partitioned so its watch cache stops at the moment thatcdcis tagged with a deletion timestamp.cdcagain. Now the CassandraDatacenter gets back with a different uid, so as its PVC. However, apiserver2 still holds the stale view thatcdchas a non-nil deletion timestamp and is about to be deleted.cdchas a deletion timestamp, the controller lists all the PVC belonging to the currently runningcdc(as mentioned above) and deletes them all.Fix
We are willing to help fix this bug by issuing a PR.
As mentioned above, the bug can be avoided by tagging each PVC with the UID of the CassandraDatacenter and listing PVC with UID. Each CassandraDatacenter will always have a different UID even with the same name. So in this case the PVCs belonging to the newly created CassandraDatacenter will not be deleted by the stale events of the old one.
┆Issue is synchronized with this Jira Story by Unito
┆Issue Number: CASS-60