Skip to content

Application pods depending on listener get stuck in pending when GKE nodes restart #342

@jeffsweep

Description

@jeffsweep

Affected Stackable version

25.7

Current and expected behavior

When Google Kubernetes Engine restarts nodes, the Superset pod gets stuck in a pending state. It refuses to start due an issue with being unable to scale-up. Presumably this would extend to other pods from other operators.

I traced this back to the listener PersistentVolumeClaim. It seems to be bound to the node(s) that no longer exists in the GKE cluster due to that node being restarted/removed/whatever. Deleting the PVC causes the listener operator to re-sync, and then everything is good again. But it requires manual intervention, which is not a good situation.

So basically whenever GKE swaps out nodes due to upgrades or other infrastructure reasons, the listener and everything connected to it get stuck.

Possible solution

Update operator code to delete PVC when nodes list changes.

Additional context

No response

Environment

v1.33.5-gke.1080000

Would you like to work on fixing this bug?

no

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions