[FR] multicluster operator: wire the decommissioning controller (auto-PVC cleanup after broker decom)

## Summary

The single-cluster operator binary (`operator/cmd/run/run.go`) ships two storage-cleanup controllers:

- `nodewatcher` → `pvcunbinder.Controller` (deletes PVCs of pods stuck Pending due to Node deletion)
- `decommissioning` → `decommissioning.Controller` (handles broker decom + storage cleanup)

The multicluster operator binary (`operator/cmd/multicluster/multicluster.go`) only wires `pvcunbinder.MulticlusterController`. The decommissioning controller is **not** registered, so auto-decommission events leave behind orphaned PVCs (and, with `reclaimPolicy: Delete` storage classes, mostly-orphaned managed disks once the StatefulSet eventually shrinks too).

This feature request asks for the decommissioning controller to be wired into the multicluster operator the same way it is in the single-cluster operator, so that StretchCluster + NodePool deployments can rely on the operator for the same broker-lifecycle storage cleanup that single-cluster Redpanda chart users get out of the box.

## Why it matters

Concrete failure mode we hit running https://github.com/david-yu/redpanda-operator-stretch-beta's Demo B (regional failure + failover-region capacity injection):

1. Region down → cluster auto-decommissions broker IDs 0, 1 (their replicas drained onto the failover region's brokers).
2. Region restored (via `az aks stop` / `start`, EC2 instance recovery, etc.). Node objects come back, the StatefulSet schedules pods on the new nodes, and **the pods bind to the same `datadir-redpanda-rp-east-{0,1}` PVCs** because Azure managed disks survive `aks stop` intact.
3. The redpanda containers start, find their old `node_uuid` in `/var/lib/redpanda/data`, and try to rejoin as the previously-decommissioned IDs. The cluster rejects them:
   ```
   bad_rejoin: trying to rejoin with same ID and UUID as a decommissioned node
   ```
   Pods then loop indefinitely at `1/2 Running`.

PVCUnbinder doesn't help here — it's keyed on pods stuck `Pending` (Node deletion + nodeAffinity break), but in `aks stop`/`start` the disks survive and the new pods immediately bind, so they never sit `Pending`. They just restart-loop on bad_rejoin.

The single-cluster `decommissioning` controller is the right component to handle this: when a broker is decommissioned (either by the cluster's own partition autobalancer, or by `rpk redpanda admin brokers decommission`), it should delete the broker's PVC. With `reclaimPolicy: Delete` on the StorageClass, the disk is reaped, and the next time the StatefulSet creates a pod with that ordinal, it gets a fresh PVC + disk + node UUID and joins as a new broker ID.

## Workaround today

Manual two-step recovery whenever a region restores after auto-decom:

```bash
kubectl --context rp-<lost-region> -n redpanda delete pvc datadir-redpanda-<pool>-<id>
kubectl --context rp-<lost-region> -n redpanda delete pod redpanda-<pool>-<id> --grace-period=0 --force
```

Operationally fragile and not how the single-cluster path documents it.

## Proposed change

Wire the existing `operator/internal/controller/decommissioning/` controller into `operator/cmd/multicluster/multicluster.go` alongside the PVCUnbinder, with the same flag surface used by the single-cluster path (`--additional-controllers=decommission` style, or unconditionally on, matching whatever the rest of the multicluster controllers do). The chart's `additionalCmdFlags` already passes through, so users can opt in via helm values.

Specifically:

- `cmd/multicluster/multicluster.go`: add an import of `operator/internal/controller/decommissioning` and a `SetupWithMultiClusterManager` (or equivalent) call alongside the existing PVCUnbinder block.
- Multicluster RBAC: extend the operator's ClusterRole/Role to include the same verbs the single-cluster decommissioning controller needs on PVCs and PVs (the chart already has `pvcunbinder.ClusterRole.yaml`; sibling `decommission.ClusterRole.yaml` exists for the single-cluster path and should mirror).

If there's a reason the decommissioning controller isn't multicluster-safe today (e.g. it assumes a single-cluster Redpanda CR, not a StretchCluster + NodePool pair), happy to scope the work — would be useful to know what blockers exist.

## Environment

- Operator chart `v26.2.1-beta.1` (multicluster build)
- Redpanda `v26.1.6`
- Validated end-to-end on AKS (eastus / westus2 / centralus + eastus2 failover) and discussed in https://github.com/redpanda-data/redpanda-operator/issues/1493 (which is a separate reconciler-bootstrap bug, not the same root cause)

Repo with full repro: https://github.com/david-yu/redpanda-operator-stretch-beta — Demo B exercises the auto-decommission + region restore flow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FR] multicluster operator: wire the decommissioning controller (auto-PVC cleanup after broker decom) #1494

Summary

Why it matters

Workaround today

Proposed change

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[FR] multicluster operator: wire the decommissioning controller (auto-PVC cleanup after broker decom) #1494

Description

Summary

Why it matters

Workaround today

Proposed change

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions