Summary
The single-cluster operator binary (operator/cmd/run/run.go) ships two storage-cleanup controllers:
nodewatcher → pvcunbinder.Controller (deletes PVCs of pods stuck Pending due to Node deletion)
decommissioning → decommissioning.Controller (handles broker decom + storage cleanup)
The multicluster operator binary (operator/cmd/multicluster/multicluster.go) only wires pvcunbinder.MulticlusterController. The decommissioning controller is not registered, so auto-decommission events leave behind orphaned PVCs (and, with reclaimPolicy: Delete storage classes, mostly-orphaned managed disks once the StatefulSet eventually shrinks too).
This feature request asks for the decommissioning controller to be wired into the multicluster operator the same way it is in the single-cluster operator, so that StretchCluster + NodePool deployments can rely on the operator for the same broker-lifecycle storage cleanup that single-cluster Redpanda chart users get out of the box.
Why it matters
Concrete failure mode we hit running https://github.com/david-yu/redpanda-operator-stretch-beta's Demo B (regional failure + failover-region capacity injection):
- Region down → cluster auto-decommissions broker IDs 0, 1 (their replicas drained onto the failover region's brokers).
- Region restored (via
az aks stop / start, EC2 instance recovery, etc.). Node objects come back, the StatefulSet schedules pods on the new nodes, and the pods bind to the same datadir-redpanda-rp-east-{0,1} PVCs because Azure managed disks survive aks stop intact.
- The redpanda containers start, find their old
node_uuid in /var/lib/redpanda/data, and try to rejoin as the previously-decommissioned IDs. The cluster rejects them:
bad_rejoin: trying to rejoin with same ID and UUID as a decommissioned node
Pods then loop indefinitely at 1/2 Running.
PVCUnbinder doesn't help here — it's keyed on pods stuck Pending (Node deletion + nodeAffinity break), but in aks stop/start the disks survive and the new pods immediately bind, so they never sit Pending. They just restart-loop on bad_rejoin.
The single-cluster decommissioning controller is the right component to handle this: when a broker is decommissioned (either by the cluster's own partition autobalancer, or by rpk redpanda admin brokers decommission), it should delete the broker's PVC. With reclaimPolicy: Delete on the StorageClass, the disk is reaped, and the next time the StatefulSet creates a pod with that ordinal, it gets a fresh PVC + disk + node UUID and joins as a new broker ID.
Workaround today
Manual two-step recovery whenever a region restores after auto-decom:
kubectl --context rp-<lost-region> -n redpanda delete pvc datadir-redpanda-<pool>-<id>
kubectl --context rp-<lost-region> -n redpanda delete pod redpanda-<pool>-<id> --grace-period=0 --force
Operationally fragile and not how the single-cluster path documents it.
Proposed change
Wire the existing operator/internal/controller/decommissioning/ controller into operator/cmd/multicluster/multicluster.go alongside the PVCUnbinder, with the same flag surface used by the single-cluster path (--additional-controllers=decommission style, or unconditionally on, matching whatever the rest of the multicluster controllers do). The chart's additionalCmdFlags already passes through, so users can opt in via helm values.
Specifically:
cmd/multicluster/multicluster.go: add an import of operator/internal/controller/decommissioning and a SetupWithMultiClusterManager (or equivalent) call alongside the existing PVCUnbinder block.
- Multicluster RBAC: extend the operator's ClusterRole/Role to include the same verbs the single-cluster decommissioning controller needs on PVCs and PVs (the chart already has
pvcunbinder.ClusterRole.yaml; sibling decommission.ClusterRole.yaml exists for the single-cluster path and should mirror).
If there's a reason the decommissioning controller isn't multicluster-safe today (e.g. it assumes a single-cluster Redpanda CR, not a StretchCluster + NodePool pair), happy to scope the work — would be useful to know what blockers exist.
Environment
Repo with full repro: https://github.com/david-yu/redpanda-operator-stretch-beta — Demo B exercises the auto-decommission + region restore flow.
Summary
The single-cluster operator binary (
operator/cmd/run/run.go) ships two storage-cleanup controllers:nodewatcher→pvcunbinder.Controller(deletes PVCs of pods stuck Pending due to Node deletion)decommissioning→decommissioning.Controller(handles broker decom + storage cleanup)The multicluster operator binary (
operator/cmd/multicluster/multicluster.go) only wirespvcunbinder.MulticlusterController. The decommissioning controller is not registered, so auto-decommission events leave behind orphaned PVCs (and, withreclaimPolicy: Deletestorage classes, mostly-orphaned managed disks once the StatefulSet eventually shrinks too).This feature request asks for the decommissioning controller to be wired into the multicluster operator the same way it is in the single-cluster operator, so that StretchCluster + NodePool deployments can rely on the operator for the same broker-lifecycle storage cleanup that single-cluster Redpanda chart users get out of the box.
Why it matters
Concrete failure mode we hit running https://github.com/david-yu/redpanda-operator-stretch-beta's Demo B (regional failure + failover-region capacity injection):
az aks stop/start, EC2 instance recovery, etc.). Node objects come back, the StatefulSet schedules pods on the new nodes, and the pods bind to the samedatadir-redpanda-rp-east-{0,1}PVCs because Azure managed disks surviveaks stopintact.node_uuidin/var/lib/redpanda/data, and try to rejoin as the previously-decommissioned IDs. The cluster rejects them:1/2 Running.PVCUnbinder doesn't help here — it's keyed on pods stuck
Pending(Node deletion + nodeAffinity break), but inaks stop/startthe disks survive and the new pods immediately bind, so they never sitPending. They just restart-loop on bad_rejoin.The single-cluster
decommissioningcontroller is the right component to handle this: when a broker is decommissioned (either by the cluster's own partition autobalancer, or byrpk redpanda admin brokers decommission), it should delete the broker's PVC. WithreclaimPolicy: Deleteon the StorageClass, the disk is reaped, and the next time the StatefulSet creates a pod with that ordinal, it gets a fresh PVC + disk + node UUID and joins as a new broker ID.Workaround today
Manual two-step recovery whenever a region restores after auto-decom:
Operationally fragile and not how the single-cluster path documents it.
Proposed change
Wire the existing
operator/internal/controller/decommissioning/controller intooperator/cmd/multicluster/multicluster.goalongside the PVCUnbinder, with the same flag surface used by the single-cluster path (--additional-controllers=decommissionstyle, or unconditionally on, matching whatever the rest of the multicluster controllers do). The chart'sadditionalCmdFlagsalready passes through, so users can opt in via helm values.Specifically:
cmd/multicluster/multicluster.go: add an import ofoperator/internal/controller/decommissioningand aSetupWithMultiClusterManager(or equivalent) call alongside the existing PVCUnbinder block.pvcunbinder.ClusterRole.yaml; siblingdecommission.ClusterRole.yamlexists for the single-cluster path and should mirror).If there's a reason the decommissioning controller isn't multicluster-safe today (e.g. it assumes a single-cluster Redpanda CR, not a StretchCluster + NodePool pair), happy to scope the work — would be useful to know what blockers exist.
Environment
v26.2.1-beta.1(multicluster build)v26.1.6Repo with full repro: https://github.com/david-yu/redpanda-operator-stretch-beta — Demo B exercises the auto-decommission + region restore flow.