-
Couldn't load subscription status.
- Fork 1
Description
Affected Stackable version
25.7
Current and expected behavior
When Google Kubernetes Engine restarts nodes, the Superset pod gets stuck in a pending state. It refuses to start due an issue with being unable to scale-up. Presumably this would extend to other pods from other operators.
I traced this back to the listener PersistentVolumeClaim. It seems to be bound to the node(s) that no longer exists in the GKE cluster due to that node being restarted/removed/whatever. Deleting the PVC causes the listener operator to re-sync, and then everything is good again. But it requires manual intervention, which is not a good situation.
So basically whenever GKE swaps out nodes due to upgrades or other infrastructure reasons, the listener and everything connected to it get stuck.
Possible solution
Update operator code to delete PVC when nodes list changes.
Additional context
No response
Environment
v1.33.5-gke.1080000
Would you like to work on fixing this bug?
no