From 7de3cb522d615626eeccc7dcd22a1204504304ea Mon Sep 17 00:00:00 2001 From: david-yu Date: Mon, 20 Apr 2026 14:50:03 -0700 Subject: [PATCH 1/4] Document manual bootstrap user password resync after Helm-to-Operator migration When the Redpanda Operator regenerates the bootstrap-user Secret, the new password is not synced into Redpanda's SCRAM database, so rpk fails to authenticate after pod restarts. Document the manual `rpk acl user update` workflow that resynchronizes the password. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../pages/kubernetes/helm-to-operator.adoc | 43 +++++++++++++++++++ 1 file changed, 43 insertions(+) diff --git a/modules/migrate/pages/kubernetes/helm-to-operator.adoc b/modules/migrate/pages/kubernetes/helm-to-operator.adoc index 2d328e7cc6..7cc63bed17 100644 --- a/modules/migrate/pages/kubernetes/helm-to-operator.adoc +++ b/modules/migrate/pages/kubernetes/helm-to-operator.adoc @@ -201,6 +201,49 @@ NAME READY STATUS redpanda True Redpanda reconciliation succeeded ---- +== Resynchronize the bootstrap user password + +When the Redpanda Operator takes ownership of a SASL-enabled cluster, it manages a `-bootstrap-user` Secret that holds the superuser credentials. If this Secret is regenerated after migration (for example, if you delete a Helm-era Secret and expect the operator to recreate it with clean ownership), the operator writes a new random password into the Secret and the pod environment variables, but Redpanda's internal SCRAM database continues to hold the original password that the Helm chart set. On the next pod restart, `rpk` inside the pod reads the new password from its environment and fails to authenticate: + +[.no-copy] +---- +SASL_AUTHENTICATION_FAILED: Invalid credentials +---- + +The Redpanda Operator does not resynchronize this password for you. You must update the SCRAM database manually, using the old password to authenticate and the new password from the regenerated Secret as the target. + +[NOTE] +==== +Only perform these steps if `rpk` fails SASL authentication after migration. You also need the original superuser password that the Helm chart used. If you no longer have it, see xref:manage:kubernetes/security/authentication/k-authentication.adoc[] for recovery options. +==== + +. Open a shell in any broker pod. The pod's environment already exposes the new password from the regenerated Secret: ++ +```bash +kubectl --namespace exec -it -0 -c redpanda -- bash +``` + +. Resynchronize the SCRAM database. Replace `` with the superuser password set by the Helm chart before migration: ++ +[,bash] +---- +export RPK_NEW_PASS="$RPK_PASS" <1> +export RPK_PASS="" <2> +rpk acl user update $RPK_USER --mechanism $RPK_SASL_MECHANISM --new-password $RPK_NEW_PASS <3> +export RPK_PASS=$RPK_NEW_PASS <4> +rpk cluster info <5> +---- ++ +-- +<1> Save the new password that the operator wrote to the Secret. `$RPK_PASS` in the pod already points to this value. +<2> Switch `rpk` to authenticate with the original Helm-era password so the next command can reach the admin API. +<3> Update the superuser's password in Redpanda's SCRAM database to match the new password in the Secret. +<4> Restore `$RPK_PASS` to the new password for subsequent commands. +<5> Verify that authentication succeeds with the new password. +-- + +After the update, the SCRAM database and the bootstrap Secret agree on the password, and subsequent pod restarts continue to authenticate cleanly. + == Roll back from Redpanda Operator to Helm If you migrated to the Redpanda Operator and want to revert to using only Helm, follow these steps to uninstall the Redpanda Operator: From ee37e2fa97009ff17642bf0870291f81074eb8ec Mon Sep 17 00:00:00 2001 From: david-yu Date: Mon, 20 Apr 2026 16:00:33 -0700 Subject: [PATCH 2/4] Add pre-check for per-pod RPK_PASS and kubelet Secret-cache warning Verified on a local kind cluster that the manual resync workflow can leave a multi-broker cluster in a mixed state: the SCRAM database ends up at the new password while some pods still hold the original in env, because kubelet caches immutable Secrets per node and does not reliably re-read them when the Secret is deleted and recreated. Add a pre-check step, node-drain guidance, and a warning about the inverted drift that occurs if the check is skipped. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../pages/kubernetes/helm-to-operator.adoc | 24 +++++++++++++++++++ 1 file changed, 24 insertions(+) diff --git a/modules/migrate/pages/kubernetes/helm-to-operator.adoc b/modules/migrate/pages/kubernetes/helm-to-operator.adoc index 7cc63bed17..5d3aa139fd 100644 --- a/modules/migrate/pages/kubernetes/helm-to-operator.adoc +++ b/modules/migrate/pages/kubernetes/helm-to-operator.adoc @@ -217,6 +217,25 @@ The Redpanda Operator does not resynchronize this password for you. You must upd Only perform these steps if `rpk` fails SASL authentication after migration. You also need the original superuser password that the Helm chart used. If you no longer have it, see xref:manage:kubernetes/security/authentication/k-authentication.adoc[] for recovery options. ==== +. Confirm that every broker pod exposes the same `RPK_PASS` value. The resync step in the next item updates the SCRAM database to whatever the pod's env currently holds, so every pod must agree on that value first: ++ +```bash +for pod in $(kubectl --namespace get pods -l app.kubernetes.io/component=redpanda-statefulset -o name); do + echo "$pod: $(kubectl --namespace exec $pod -c redpanda -- printenv RPK_PASS)" +done +``` ++ +-- +If every pod reports the same password, continue to the next step. + +If one or more pods report a different `RPK_PASS` value from the rest, the bootstrap Secret on those pods' nodes is still cached by the kubelet. The Redpanda Operator creates the `-bootstrap-user` Secret with `immutable: true`, and kubelet's local Secret cache does not always invalidate when an immutable Secret is deleted and re-created with the same name. To force affected pods to pick up the current Secret value: + +- Force-delete the pod: `kubectl --namespace delete pod --force --grace-period=0`. +- If the env still disagrees after the pod is re-created, drain the node that hosted the pod so the StatefulSet schedules it elsewhere: `kubectl drain --ignore-daemonsets --delete-emptydir-data`. + +Repeat the check until every pod reports the same value before continuing. +-- + . Open a shell in any broker pod. The pod's environment already exposes the new password from the regenerated Secret: + ```bash @@ -244,6 +263,11 @@ rpk cluster info <5> After the update, the SCRAM database and the bootstrap Secret agree on the password, and subsequent pod restarts continue to authenticate cleanly. +[WARNING] +==== +If you skipped the pre-check in step 1 and any pod still had the original password in its environment when you ran the update, that pod's `rpk` now fails SASL authentication even though it worked before: the SCRAM database has moved to the new password while the pod's env still holds the old one. Re-run the pre-check across all pods and force-delete any pod whose env does not match the regenerated Secret. +==== + == Roll back from Redpanda Operator to Helm If you migrated to the Redpanda Operator and want to revert to using only Helm, follow these steps to uninstall the Redpanda Operator: From 1178c21ad3250450902a309a3989c297ca852851 Mon Sep 17 00:00:00 2001 From: david-yu Date: Mon, 20 Apr 2026 16:06:10 -0700 Subject: [PATCH 3/4] Remove trailing WARNING block duplicating pre-check guidance Co-Authored-By: Claude Opus 4.7 (1M context) --- modules/migrate/pages/kubernetes/helm-to-operator.adoc | 5 ----- 1 file changed, 5 deletions(-) diff --git a/modules/migrate/pages/kubernetes/helm-to-operator.adoc b/modules/migrate/pages/kubernetes/helm-to-operator.adoc index 5d3aa139fd..50b620fff7 100644 --- a/modules/migrate/pages/kubernetes/helm-to-operator.adoc +++ b/modules/migrate/pages/kubernetes/helm-to-operator.adoc @@ -263,11 +263,6 @@ rpk cluster info <5> After the update, the SCRAM database and the bootstrap Secret agree on the password, and subsequent pod restarts continue to authenticate cleanly. -[WARNING] -==== -If you skipped the pre-check in step 1 and any pod still had the original password in its environment when you ran the update, that pod's `rpk` now fails SASL authentication even though it worked before: the SCRAM database has moved to the new password while the pod's env still holds the old one. Re-run the pre-check across all pods and force-delete any pod whose env does not match the regenerated Secret. -==== - == Roll back from Redpanda Operator to Helm If you migrated to the Redpanda Operator and want to revert to using only Helm, follow these steps to uninstall the Redpanda Operator: From a07ff5399016d1231424c08faab34aad29437227 Mon Sep 17 00:00:00 2001 From: david-yu Date: Tue, 21 Apr 2026 10:28:41 -0700 Subject: [PATCH 4/4] Address review comments on bootstrap-user resync section Reframes the procedure as a recovery path rather than a routine migration step, per Slack thread context from Andrew Stucki. - Move the section under Troubleshooting as an H3 so the TOC no longer suggests it is a required migration step. - Add a CAUTION explaining that the bootstrap user is designed to be long-lived and the Secret should not be regenerated. - Add a TIP recommending that readers back up the current password before any operation that might regenerate the Secret. - Explain the SCRAM-level reason the operator can't auto-resync (password change requires the old password; the Secret only holds one credential). - Replace the broken recovery xref in the NOTE with realistic guidance (restore from backup or contact support). - Quote shell variable expansions in the `rpk acl user update` and `export RPK_PASS` lines so special characters do not break the command. Co-Authored-By: Claude Opus 4.7 (1M context) --- .../pages/kubernetes/helm-to-operator.adoc | 58 ++++++++++++------- 1 file changed, 37 insertions(+), 21 deletions(-) diff --git a/modules/migrate/pages/kubernetes/helm-to-operator.adoc b/modules/migrate/pages/kubernetes/helm-to-operator.adoc index 50b620fff7..f77b4ecae2 100644 --- a/modules/migrate/pages/kubernetes/helm-to-operator.adoc +++ b/modules/migrate/pages/kubernetes/helm-to-operator.adoc @@ -201,7 +201,28 @@ NAME READY STATUS redpanda True Redpanda reconciliation succeeded ---- -== Resynchronize the bootstrap user password +== Roll back from Redpanda Operator to Helm + +If you migrated to the Redpanda Operator and want to revert to using only Helm, follow these steps to uninstall the Redpanda Operator: + +include::deploy:partial$kubernetes/guides/uninstall.adoc[tag=operator] + +After completing these steps, the Redpanda Operator is no longer managing your Helm deployment. + +== Troubleshooting + +While the deployment process can sometimes take a few minutes, a prolonged 'not ready' status may indicate an issue. + +include::troubleshoot:partial$errors-and-solutions.adoc[tags=deployment] + +For more troubleshooting steps, see xref:manage:kubernetes/troubleshooting/k-troubleshoot.adoc[Troubleshoot Redpanda in Kubernetes]. + +=== Resynchronize the bootstrap user password + +[CAUTION] +==== +The bootstrap user is designed to be set once at cluster creation and remain long-lived. Avoid regenerating the `-bootstrap-user` Secret when possible. Only use this procedure if the Secret has already been regenerated and you need to recover from the resulting authentication failure. +==== When the Redpanda Operator takes ownership of a SASL-enabled cluster, it manages a `-bootstrap-user` Secret that holds the superuser credentials. If this Secret is regenerated after migration (for example, if you delete a Helm-era Secret and expect the operator to recreate it with clean ownership), the operator writes a new random password into the Secret and the pod environment variables, but Redpanda's internal SCRAM database continues to hold the original password that the Helm chart set. On the next pod restart, `rpk` inside the pod reads the new password from its environment and fails to authenticate: @@ -210,11 +231,22 @@ When the Redpanda Operator takes ownership of a SASL-enabled cluster, it manages SASL_AUTHENTICATION_FAILED: Invalid credentials ---- -The Redpanda Operator does not resynchronize this password for you. You must update the SCRAM database manually, using the old password to authenticate and the new password from the regenerated Secret as the target. +The Redpanda Operator does not resynchronize this password for you: changing a SCRAM password requires authenticating with the old password, and the bootstrap Secret only tracks one credential at a time. You must update the SCRAM database manually, using the old password to authenticate and the new password from the regenerated Secret as the target. + +[TIP] +==== +If you have not regenerated the bootstrap-user Secret yet, back up the current password first so you have it available if you need to run this procedure later: + +[,bash] +---- +kubectl --namespace get secret -bootstrap-user \ + -o jsonpath='{.data.password}' | base64 -d +---- +==== [NOTE] ==== -Only perform these steps if `rpk` fails SASL authentication after migration. You also need the original superuser password that the Helm chart used. If you no longer have it, see xref:manage:kubernetes/security/authentication/k-authentication.adoc[] for recovery options. +Only perform these steps if `rpk` fails SASL authentication after migration. You also need the original superuser password that the Helm chart used. If you no longer have it, you may need to restore from backup or contact Redpanda support. ==== . Confirm that every broker pod exposes the same `RPK_PASS` value. The resync step in the next item updates the SCRAM database to whatever the pod's env currently holds, so every pod must agree on that value first: @@ -248,8 +280,8 @@ kubectl --namespace exec -it -0 -c redpanda -- bash ---- export RPK_NEW_PASS="$RPK_PASS" <1> export RPK_PASS="" <2> -rpk acl user update $RPK_USER --mechanism $RPK_SASL_MECHANISM --new-password $RPK_NEW_PASS <3> -export RPK_PASS=$RPK_NEW_PASS <4> +rpk acl user update "$RPK_USER" --mechanism "$RPK_SASL_MECHANISM" --new-password "$RPK_NEW_PASS" <3> +export RPK_PASS="$RPK_NEW_PASS" <4> rpk cluster info <5> ---- + @@ -263,22 +295,6 @@ rpk cluster info <5> After the update, the SCRAM database and the bootstrap Secret agree on the password, and subsequent pod restarts continue to authenticate cleanly. -== Roll back from Redpanda Operator to Helm - -If you migrated to the Redpanda Operator and want to revert to using only Helm, follow these steps to uninstall the Redpanda Operator: - -include::deploy:partial$kubernetes/guides/uninstall.adoc[tag=operator] - -After completing these steps, the Redpanda Operator is no longer managing your Helm deployment. - -== Troubleshooting - -While the deployment process can sometimes take a few minutes, a prolonged 'not ready' status may indicate an issue. - -include::troubleshoot:partial$errors-and-solutions.adoc[tags=deployment] - -For more troubleshooting steps, see xref:manage:kubernetes/troubleshooting/k-troubleshoot.adoc[Troubleshoot Redpanda in Kubernetes]. - === Open an issue If you cannot solve the issue or need assistance during the migration process, https://github.com/redpanda-data/redpanda-operator/issues/new/choose[open a GitHub issue^]. Before opening a new issue, search the existing issues on GitHub to see if someone has already reported a similar problem or if any relevant discussions can help you.