Summary
When a 4th cluster is added to an existing 3-peer multicluster deployment, the new peer's operator pod comes up Running and joins the raft (often as StateLeader), but the NodePool and StretchCluster controllers never fire on that peer's local resources — they sit at conditions[*].status: Unknown with reason: NotReconciled / message: "Waiting for controller" indefinitely. A kubectl rollout restart deployment <operator> on the new peer immediately unblocks reconciliation. We've also observed a related variant of the same bug on the existing peers when raft elections drop.
This looks like a controller-runtime informer/cache initialization race against multicluster raft membership being settled — the controllers register before raft is ready, then never recover from the empty-cache state without a process restart.
Operator version: v26.2.1-beta.1 (multicluster build).
Reproduction
The full multicluster setup is captured at https://github.com/david-yu/redpanda-operator-stretch-beta — Demo B's failover-region capacity-injection flow exercises this path. Minimal repro:
-
Stand up the 3-peer stretch cluster per Step 1–7 in the README:
- 3 K8s clusters (
rp-east, rp-west, rp-eu) with cross-region pod-IP routability
rpk k8s multicluster bootstrap --context rp-east --context rp-west --context rp-eu --namespace redpanda --loadbalancer
helm install <ctx> redpanda/operator --version 26.2.1-beta.1 --devel per cluster, with matching multicluster.peers and fullnameOverride: <ctx> values
- Apply a
StretchCluster and one NodePool per cluster; wait until the StretchCluster reports Ready=True / Healthy=True.
-
Provision a 4th K8s cluster (rp-failover, separate region) with cross-region pod-IP + LB connectivity to the existing three.
-
Re-run bootstrap with all 4 contexts (idempotent on the existing 3, generates fresh state for rp-failover):
rpk k8s multicluster bootstrap \\
--context rp-east --context rp-west --context rp-eu --context rp-failover \\
--namespace redpanda --loadbalancer
-
Render an rp-failover helm values file with multicluster.name: rp-failover and a 4-entry multicluster.peers block including all four LB addresses. Render matching 4-peer values for the existing three.
-
helm install rp-failover redpanda/operator --version 26.2.1-beta.1 --devel -f values-rp-failover.yaml -n redpanda. The operator pod becomes 1/1 Running.
-
helm upgrade each of the existing 3 with the new 4-peer values.
-
Apply a StretchCluster and a NodePool (replicas: 2) on rp-failover:
apiVersion: cluster.redpanda.com/v1alpha2
kind: StretchCluster
metadata: { name: redpanda, namespace: redpanda }
spec: ... # same shape as the existing 3 clusters
---
apiVersion: cluster.redpanda.com/v1alpha2
kind: NodePool
metadata: { name: rp-failover, namespace: redpanda }
spec:
clusterRef: { group: cluster.redpanda.com, kind: StretchCluster, name: redpanda }
replicas: 2
image: { repository: redpandadata/redpanda, tag: v26.1.6 }
services: { perPod: { remote: { enabled: true } } }
Observed (10+ minutes after step 7)
`rpk k8s multicluster status` reports the 4-peer mesh as fully healthy:
CLUSTER OPERATOR RAFT-STATE LEADER PEERS UNHEALTHY TLS SECRETS
rp-east Running StateFollower rp-failover 4 0 ok ok
rp-west Running StateFollower rp-failover 4 0 ok ok
rp-eu Running StateFollower rp-failover 4 0 ok ok
rp-failover Running StateLeader rp-failover 4 0 ok ok
CROSS-CLUSTER:
✓ [unique-names] all node names are unique
✓ [peer-agreement] peer lists agree across all clusters
✓ [leader-agreement] leader agreement: rp-failover (term 4)
✓ [ca-consistency] all clusters share the same CA
…but the NodePool and StretchCluster on rp-failover are stuck:
$ kubectl --context rp-failover -n redpanda get nodepool
NAME BOUND DEPLOYED
rp-failover Unknown Unknown
$ kubectl --context rp-failover -n redpanda get nodepool rp-failover -o yaml
status:
conditions:
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: NotReconciled
status: Unknown
type: Bound
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: NotReconciled
status: Unknown
type: Deployed
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: NotReconciled
status: Unknown
type: Quiesced
- lastTransitionTime: "1970-01-01T00:00:00Z"
message: Waiting for controller
reason: NotReconciled
status: Unknown
type: Stable
$ kubectl --context rp-failover -n redpanda get stretchcluster redpanda \\
-o jsonpath='{range .status.conditions[*]}{.type}={.status}{\"\\n\"}{end}'
Ready=Unknown
Healthy=Unknown
LicenseValid=Unknown
ResourcesSynced=Unknown
ConfigurationApplied=Unknown
SpecSynced=Unknown
BootstrapUserSynced=Unknown
Quiesced=Unknown
Stable=Unknown
No StatefulSet is created and no broker pods exist. The operator's logs only show raft activity (vote requests, term advances, leader election); no Reconciler/NodePool or Reconciler/StretchCluster log lines.
Workaround
kubectl --context rp-failover -n redpanda rollout restart deployment rp-failover
Within ~30 s of the new operator pod coming up, the NodePool flips to BOUND=True / DEPLOYED=True, the StretchCluster transitions through Ready=False → True, the StatefulSet is created, broker pods come up, and they join the cluster as new IDs (5, 6).
Variant we hit on existing peers
The same flow at the raft-join layer: when the new peer is added, it sometimes stays StatePreCandidate indefinitely with the existing 3 reporting unhealthy peers: <new-peer> even though all 4 LB IPs are reachable in both directions and TLS verifies. A kubectl rollout restart deployment on the existing 3 operators (so they re-load the 4-peer config and re-handshake) makes the new peer election succeed within a few seconds. We hit this on a previous run of the same flow.
Expected
After helm install + helm upgrade on the new peer, the operator should reconcile its own local NodePool/StretchCluster resources without needing a manual restart. Same for raft join — when the existing peers' multicluster.peers is updated via helm upgrade, the new peer's election should converge without bouncing the existing operators.
Environment
- Operator:
redpanda/operator @ v26.2.1-beta.1 (chart) / multicluster build
- Redpanda:
v26.1.6
- Kubernetes: AKS 1.34 (validated end-to-end on Azure: eastus / westus2 / centralus / eastus2 for failover) and GKE 1.35 RAPID
- Both reproductions fresh-bootstrapped from terraform; no carried-over state.
What might help
- A startup ordering check: don't mark operator pod Ready until raft membership has settled and the local controllers' caches have synced for at least one tick.
- Defensive re-list of CRs after raft membership changes (or after the operator transitions between candidate/leader/follower for the first time).
Happy to provide additional logs / state dumps if useful.
Summary
When a 4th cluster is added to an existing 3-peer multicluster deployment, the new peer's operator pod comes up
Runningand joins the raft (often asStateLeader), but the NodePool and StretchCluster controllers never fire on that peer's local resources — they sit atconditions[*].status: Unknownwithreason: NotReconciled / message: "Waiting for controller"indefinitely. Akubectl rollout restart deployment <operator>on the new peer immediately unblocks reconciliation. We've also observed a related variant of the same bug on the existing peers when raft elections drop.This looks like a controller-runtime informer/cache initialization race against multicluster raft membership being settled — the controllers register before raft is ready, then never recover from the empty-cache state without a process restart.
Operator version:
v26.2.1-beta.1(multicluster build).Reproduction
The full multicluster setup is captured at https://github.com/david-yu/redpanda-operator-stretch-beta — Demo B's failover-region capacity-injection flow exercises this path. Minimal repro:
Stand up the 3-peer stretch cluster per Step 1–7 in the README:
rp-east,rp-west,rp-eu) with cross-region pod-IP routabilityrpk k8s multicluster bootstrap --context rp-east --context rp-west --context rp-eu --namespace redpanda --loadbalancerhelm install <ctx> redpanda/operator --version 26.2.1-beta.1 --develper cluster, with matchingmulticluster.peersandfullnameOverride: <ctx>valuesStretchClusterand oneNodePoolper cluster; wait until the StretchCluster reportsReady=True / Healthy=True.Provision a 4th K8s cluster (
rp-failover, separate region) with cross-region pod-IP + LB connectivity to the existing three.Re-run bootstrap with all 4 contexts (idempotent on the existing 3, generates fresh state for
rp-failover):Render an
rp-failoverhelm values file withmulticluster.name: rp-failoverand a 4-entrymulticluster.peersblock including all four LB addresses. Render matching 4-peer values for the existing three.helm install rp-failover redpanda/operator --version 26.2.1-beta.1 --devel -f values-rp-failover.yaml -n redpanda. The operator pod becomes1/1 Running.helm upgradeeach of the existing 3 with the new 4-peer values.Apply a
StretchClusterand aNodePool(replicas: 2) onrp-failover:Observed (10+ minutes after step 7)
`rpk k8s multicluster status` reports the 4-peer mesh as fully healthy:
…but the NodePool and StretchCluster on
rp-failoverare stuck:No StatefulSet is created and no broker pods exist. The operator's logs only show raft activity (vote requests, term advances, leader election); no
Reconciler/NodePoolorReconciler/StretchClusterlog lines.Workaround
Within ~30 s of the new operator pod coming up, the NodePool flips to
BOUND=True / DEPLOYED=True, the StretchCluster transitions throughReady=False → True, the StatefulSet is created, broker pods come up, and they join the cluster as new IDs (5, 6).Variant we hit on existing peers
The same flow at the raft-join layer: when the new peer is added, it sometimes stays
StatePreCandidateindefinitely with the existing 3 reportingunhealthy peers: <new-peer>even though all 4 LB IPs are reachable in both directions and TLS verifies. Akubectl rollout restart deploymenton the existing 3 operators (so they re-load the 4-peer config and re-handshake) makes the new peer election succeed within a few seconds. We hit this on a previous run of the same flow.Expected
After
helm install+helm upgradeon the new peer, the operator should reconcile its own local NodePool/StretchCluster resources without needing a manual restart. Same for raft join — when the existing peers'multicluster.peersis updated viahelm upgrade, the new peer's election should converge without bouncing the existing operators.Environment
redpanda/operator @ v26.2.1-beta.1(chart) / multicluster buildv26.1.6What might help
Happy to provide additional logs / state dumps if useful.