Operator-issued broker TLS certs reject strict hostname verification on the advertised RPC hostname (`*.redpanda` violates RFC-6125)

## Summary

When `multicluster.enabled: true` (or any setup where the operator emits broker hostnames in the form `<podname>.<headless-svc>` — i.e. only **two DNS labels**), the operator-generated `Certificate` resource is issued with a SAN list whose wildcard entries have a single-label parent (`*.redpanda`, `*.redpanda.svc`). RFC 6125 §6.4.3 disallows wildcards on a single-label parent (analogous to `*.com`), and OpenSSL — which is what the Redpanda broker links against — enforces this. So even though brokers complete the TLS handshake, the RPC client drops the connection immediately afterwards because the server-cert hostname check fails:

```
verify error:num=62: hostname mismatch
```

In broker logs this surfaces as:

```
rpc - server.cc:175 - Error[applying protocol] remote address: <ip>:<port> -
  std::__1::system_error (error OpenSSL:167772358,
    Failed to establish SSL handshake:
    [error:0A0000C6:SSL routines::packet length too long,
     error:0A000139:SSL routines::record layer failure])
```

…on the listening side, and on the initiating side as a stream of `cluster_bootstrap_info failed … rpc::errc:4` and `Error dispatching socket write … Broken pipe`. The brokers never form quorum and the StretchCluster stays `Ready: False` forever.

## How to reproduce (minimal — no stretch / cross-cloud setup needed)

Single-cluster reproduction. The bug is in the operator's Certificate generation, not in the cross-cluster machinery — you only need multicluster mode toggled on, even if there's a single peer that points back at itself.

```bash
# 1. Install cert-manager (any recent version).
helm install cert-manager jetstack/cert-manager \
  -n cert-manager --create-namespace \
  --version v1.16.2 --set installCRDs=true

# 2. Install the operator in multicluster mode with a self-peer.
helm repo add redpanda-data https://charts.redpanda.com
helm install rp-self redpanda-data/operator -n redpanda --create-namespace \
  --version 26.2.1-beta.1 \
  --set fullnameOverride=rp-self \
  --set crds.enabled=true \
  --set multicluster.enabled=true \
  --set multicluster.name=rp-self \
  --set multicluster.apiServerExternalAddress=https://kubernetes.default.svc \
  --set multicluster.peers[0].name=rp-self \
  --set multicluster.peers[0].address=rp-self-multicluster-peer.redpanda.svc.cluster.local

# 3. Apply a StretchCluster with TLS on + cert-manager-issued CA.
cat <<EOF | kubectl apply -f -
apiVersion: cluster.redpanda.com/v1alpha2
kind: StretchCluster
metadata:
  name: redpanda
  namespace: redpanda
spec:
  rbac: { enabled: true }
  external: { enabled: false }
  networking: { crossClusterMode: flat }
  tls:
    enabled: true
    certs:
      default:
        caEnabled: true
        applyInternalDNSNames: true
EOF

cat <<EOF | kubectl apply -f -
apiVersion: cluster.redpanda.com/v1alpha2
kind: NodePool
metadata:
  name: rp-self
  namespace: redpanda
spec:
  clusterRef:
    group: cluster.redpanda.com
    kind: StretchCluster
    name: redpanda
  replicas: 3
  image:
    repository: redpandadata/redpanda
    tag: v26.1.6
EOF

# 4. Inspect the generated Certificate's SANs.
kubectl -n redpanda get secret redpanda-default-cert -o jsonpath='{.data.tls\.crt}' \
  | base64 -d \
  | openssl x509 -noout -ext subjectAltName

# Expected output (this is the bug):
#   X509v3 Subject Alternative Name: critical
#       DNS:redpanda.redpanda.svc.cluster.local,
#       DNS:redpanda.redpanda.svc,
#       DNS:redpanda.redpanda,
#       DNS:*.redpanda.redpanda.svc.cluster.local,
#       DNS:*.redpanda.redpanda.svc,
#       DNS:*.redpanda.redpanda,
#       DNS:*.redpanda.svc.cluster.local,
#       DNS:*.redpanda.svc,
#       DNS:*.redpanda                       ← single-label parent, RFC-6125 violation
#
# 5. Try strict hostname verification against the hostname brokers actually advertise.
ADVERTISED=redpanda-rp-self-0.redpanda
echo Q | openssl s_client \
  -connect ${ADVERTISED}:33145 \
  -servername ${ADVERTISED} \
  -CAfile <(kubectl -n redpanda get secret redpanda-default-cert -o jsonpath='{.data.ca\.crt}' | base64 -d) \
  -verify_hostname ${ADVERTISED} \
  2>&1 | grep -i 'verify error\|hostname'

# Expected:
#   verify error:num=62: hostname mismatch
#   Verify return code: 62 (hostname mismatch)
```

In stretch / multicluster mode, the brokers never reach `Ready` because every cluster_bootstrap_info RPC handshake hits this. In single-cluster non-multicluster mode the operator emits broker FQDNs with more labels (`redpanda-0.redpanda.<ns>.svc.cluster.local`) where the wildcard SAN's parent is at least 4 labels deep, so OpenSSL accepts it and the bug is hidden.

## Root cause

Two things conspire:

1. **The operator's `flat` cross-cluster networking mode** writes both `seed_servers[].host.address` and `advertised_rpc_api.address` as `<podname>.<headless-svc>` (2 labels) so the same hostname resolves identically on every cluster. There is no namespace/svc/cluster-local suffix.
2. **The operator's `applyInternalDNSNames: true` cert-template** emits wildcard SANs that mirror the headless-Service hierarchy (`*.redpanda`, `*.redpanda.svc`, `*.redpanda.svc.cluster.local`, etc.). The shortest of those — `*.redpanda` — is the only one with a parent that matches the 2-label advertised hostname, but its parent is a single label and OpenSSL refuses the match.

So the SANs the operator emits don't actually cover the hostname the operator's own RPC clients connect to.

## Proposed fix: add explicit per-broker SANs to the Certificate template

When the operator generates the broker Certificate, in addition to the headless-Service wildcards it should add one **explicit** DNS SAN per broker matching what's written into `advertised_rpc_api.address` / `seed_servers[].host.address`. Concretely:

```yaml
# Today (the bug):
dnsNames:
  - redpanda.redpanda.svc.cluster.local
  - redpanda.redpanda.svc
  - redpanda.redpanda
  - '*.redpanda.svc.cluster.local'
  - '*.redpanda.svc'
  - '*.redpanda'                         # ← unusable

# Proposed: also include the explicit per-broker hostnames (one per broker
# pod in this NodePool, plus one per peer broker in cross-cluster mode):
  - redpanda-rp-self-0.redpanda
  - redpanda-rp-self-1.redpanda
  - redpanda-rp-self-2.redpanda
```

Because the operator already knows every broker's pod-name (the StatefulSet ordinal × NodePool name × Cluster name) at reconcile time, it has all the information needed to enumerate these SANs — same place it already builds the seed_servers list.

This costs one extra DNS SAN per broker on a Cert that's regenerated infrequently — bounded and small.

### Backwards compatibility (non-stretch / pre-26.2 broker)

The change is purely additive on the Certificate's `dnsNames` list and doesn't require any new field on the StretchCluster / Redpanda CR or any new behavior in the broker. Concretely:

- **Existing wildcards keep working.** A non-stretch single-cluster deployment uses brokers that advertise their full FQDN (`redpanda-0.redpanda.<ns>.svc.cluster.local` — 5 labels). Wildcard SANs like `*.redpanda.<ns>.svc.cluster.local` (4-label parent) match those fine; that path is unaffected. We're only adding *more* SANs alongside the existing ones, not replacing them.
- **No broker-side config change required.** A 26.1 (non-stretch) broker reading a cert with three extra DNS SANs verifies it the same way it does today — OpenSSL just sees a longer SAN list and walks it linearly until something matches the connection hostname. There's no new TLS config knob being introduced.
- **Reconcile site is the same one that already enumerates brokers.** The operator code that builds `seed_servers` / `advertised_rpc_api` runs on every Redpanda *and* StretchCluster reconcile, including non-multicluster Redpanda CRs. Adding a parallel loop that emits one explicit SAN per pod doesn't open any new code path — it consumes an enumeration that's already produced.
- **Explicitly handles the `multicluster.enabled` + non-stretch combination.** The minimal repro at the top of this issue is a single-cluster deployment with `multicluster.enabled: true` and a self-peer — non-stretch, but still hits the bug. The fix lands them in the working state too without any flag flip.

Net effect: existing non-stretch deployments see a cert with a few additional well-formed SANs; nothing changes for verification of the hostnames they were already verifying. The bug is fixed for stretch / multicluster / `crossClusterMode: flat` setups where the broker advertises a 2-label hostname today.

## Workaround in the meantime

We're running with `spec.tls.enabled: false` at the StretchCluster layer plus explicit `spec.listeners.{kafka,admin,http,schemaRegistry,rpc}.tls.enabled: false` on every listener (the chart still emits `kafka_api_tls` etc. against nonexistent cert paths if listener TLS isn't disabled per-listener). Confidentiality is provided by the underlay — in our cross-cloud scaffold, IPsec VPN tunnels between clouds — but a single-cluster non-stretch deployment doesn't have that fallback.

## Environment

- Operator chart: `redpanda-data/operator @ 26.2.1-beta.1`
- Redpanda image: `redpandadata/redpanda:v26.1.6`
- cert-manager: `v1.16.2`
- Tested on EKS 1.34, GKE 1.35.3, AKS 1.34 — same behavior on all three, doesn't depend on the K8s flavor.
- Kubernetes / OpenSSL versions in the broker image follow upstream defaults; OpenSSL 3.x is strict about RFC 6125 single-label parents (3.0+ deprecated and 3.1+ rejects them by default).

cc: scaffold for full reproduction at <https://github.com/david-yu/redpanda-operator-stretch-cross-cloud-beta> (cross-cloud) — the same-cloud variant <https://github.com/david-yu/redpanda-operator-stretch-beta> appears to work because the broker pods land on the same Kubernetes context's local DNS where the chart uses longer FQDNs; the cross-cluster `flat` mode short-form is the trigger.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Operator-issued broker TLS certs reject strict hostname verification on the advertised RPC hostname (`*.redpanda` violates RFC-6125) #1499

Summary

How to reproduce (minimal — no stretch / cross-cloud setup needed)

Root cause

Proposed fix: add explicit per-broker SANs to the Certificate template

Backwards compatibility (non-stretch / pre-26.2 broker)

Workaround in the meantime

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Operator-issued broker TLS certs reject strict hostname verification on the advertised RPC hostname (*.redpanda violates RFC-6125) #1499

Description

Summary

How to reproduce (minimal — no stretch / cross-cloud setup needed)

Root cause

Proposed fix: add explicit per-broker SANs to the Certificate template

Backwards compatibility (non-stretch / pre-26.2 broker)

Workaround in the meantime

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Operator-issued broker TLS certs reject strict hostname verification on the advertised RPC hostname (`*.redpanda` violates RFC-6125) #1499