operator: user adoption and add credential sync for externally-managed secrets#1438
operator: user adoption and add credential sync for externally-managed secrets#1438
Conversation
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
|
This PR is stale because it has been open 5 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Azure Kubernetes Service —
|
| # | Scenario | Result |
|---|---|---|
| 1 | Adoption of a pre-existing Redpanda user — pre-create appuser via the admin REST API with SCRAM-SHA-256, then apply a User CR with type: scram-sha-256, noGenerate: true, syncCredentials: true referencing an ESO-synced Secret |
PASS — Synced=True, managedUser: true, managedAcls: true; SCRAM-SHA-256 auth using the ESO-supplied password works |
| 2 | Credential rotation via Key Vault — az keyvault secret set to a new value, observe the chain push the change through to Redpanda |
PASS — ESO refreshed the K8s Secret in ~29 s; the new password authenticates 18 s after the K8s Secret update; the old password is rejected with SASL_AUTHENTICATION_FAILED |
Changes vs the previous run
| Previous run | This run | |
|---|---|---|
auth.sasl.secretRef |
unset (chart default redpanda-users, never created — pod stuck Init:0/3) |
redpanda-superusers, pre-created with empty superusers.txt; auth.sasl.users: [] |
| Workaround Secret | manually created redpanda-users with users.txt: kubernetes-controller:<pwd>:SCRAM-SHA-256 and rescheduled rp-0 |
none — pod went 2/2 Ready on first boot |
User.spec.authentication.type |
scram-sha-512 (mismatched the chart's bootstrap default of SCRAM-SHA-256) |
scram-sha-256 — matches BootstrapUser.GetMechanism() default in charts/redpanda/values.go:1410-1415 |
Pre-created appuser algorithm |
SCRAM-SHA-512 |
SCRAM-SHA-256 |
The chart's bootstrap user (kubernetes-controller) defaults to SCRAM-SHA-256 today (env RPK_SASL_MECHANISM=SCRAM-SHA-256 confirmed on rp-0), so the User CR's type: scram-sha-256 keeps everything on a single mechanism.
Environment
Region: eastus
Resource Group: claude-pr1438-retest-<redacted> (deleted at end of run)
AKS: pr1438-aks-<redacted> — 3 x Standard_D2s_v3, K8s v1.34.6, OIDC + Workload Identity enabled
ACR: <redacted>.azurecr.io
Key Vault: pr1438-kv-<redacted>
Operator image: <ACR>/redpanda-operator:pr1438@sha256:cd362757d296e512ee38b57ec5ba805b19e5fb036cbb3ff42fcc0177a14e26ff (built from 22b3a27b)
ESO: v0.20.4
cert-manager: v1.17.2
Operator chart: from this PR's branch (helm install …operator/chart)
Step-by-step (delta vs the previous run)
The only Redpanda-side changes are the SASL bootstrap and the User CR mechanism. Everything else (Workload Identity, ESO ClusterSecretStore, image build/push, operator chart install) is identical to the previous comment.
# Pre-create the superusers Secret per the docs.
# Empty superusers.txt is valid — kubernetes-controller is appended automatically
# by BootstrapUser.Username() in charts/redpanda/values.go.
kubectl -n redpanda create secret generic redpanda-superusers \
--from-literal=superusers.txt=''
# Apply the Redpanda CR with the secretRef pattern.
kubectl -n redpanda apply -f - <<'YAML'
apiVersion: cluster.redpanda.com/v1alpha2
kind: Redpanda
metadata: {name: rp}
spec:
chartRef: {}
clusterSpec:
statefulset:
replicas: 1
sideCars: {controllers: {enabled: true, createRBAC: true}}
resources: {cpu: {cores: 1}, memory: {container: {max: 2Gi}}}
storage: {persistentVolume: {enabled: true, size: 10Gi}}
auth:
sasl:
enabled: true
secretRef: redpanda-superusers # docs-recommended
users: [] # empty: chart will not generate the secret
# BootstrapUser defaults: username=kubernetes-controller, mechanism=SCRAM-SHA-256
tls: {enabled: false}
listeners: {kafka: {tls: {enabled: false}}, admin: {tls: {enabled: false}}}
YAMLrp-0 reaches 2/2 Ready in ~86 s with no manual Secret intervention.
# Confirm the bootstrap mechanism actually wired to SCRAM-SHA-256:
kubectl -n redpanda exec rp-0 -c redpanda -- sh -c 'echo $RPK_USER $RPK_SASL_MECHANISM'
# kubernetes-controller SCRAM-SHA-256Test 1 — Adoption (pre-existing user, SCRAM-SHA-256)
BOOT_PWD=$(kubectl -n redpanda get secret rp-bootstrap-user -o jsonpath='{.data.password}' | base64 -d)
# Pre-create appuser with SCRAM-SHA-256 BEFORE any User CR exists
kubectl -n redpanda exec rp-0 -c redpanda -- curl -fsS -u "kubernetes-controller:${BOOT_PWD}" \
-X POST -H "Content-Type: application/json" \
-d '{"username":"appuser","password":"<INITIAL_PWD>","algorithm":"SCRAM-SHA-256"}' \
http://rp.redpanda.svc.cluster.local:9644/v1/security/users
# Existing users right after this: ["appuser","kubernetes-controller"]
kubectl -n redpanda apply -f - <<'YAML'
apiVersion: cluster.redpanda.com/v1alpha2
kind: User
metadata: {name: appuser}
spec:
cluster: {clusterRef: {name: rp}}
authentication:
type: scram-sha-256 # matches chart BootstrapUser default
password:
valueFrom:
secretKeyRef: {name: appuser-password, key: password}
noGenerate: true # PR #1438 — ESO owns the Secret
syncCredentials: true # PR #1438 — re-read on every reconcile
authorization:
acls:
- type: allow
resource: {type: topic, name: test-topic}
operations: [Read, Write, Describe, Create]
YAMLResult:
status:
conditions:
- lastTransitionTime: "2026-05-01T16:54:44Z"
message: 'Successfully synced "appuser" to cluster.'
observedGeneration: 1
reason: Synced
status: "True"
type: Synced
managedAcls: true
managedUser: true # <-- pre-existing user adopted (PR fix)
observedGeneration: 1$ rpk cluster info -X user=appuser -X pass='<INITIAL_PWD>' -X sasl.mechanism=SCRAM-SHA-256 ...
CLUSTER
=======
redpanda.<cluster-id>
BROKERS
=======
ID HOST PORT
0* rp-0.rp.redpanda.svc.cluster.local. 9093
Test 2 — Credential rotation via Azure Key Vault
az keyvault secret set --vault-name "$KV" --name redpanda-user-password --value '<ROTATED_PWD>'Verifiable timeline (raw timestamps):
| Time (UTC) | Event | Source |
|---|---|---|
16:55:25Z |
Rotation initiated (az keyvault secret set) |
test script |
16:55:30Z |
Key Vault confirms write | az response |
16:55:59Z |
K8s Secret/appuser-password updated to <ROTATED_PWD> (resourceVersion 5757 → 6973); ~29 s after KV write at refreshInterval: 1m |
poll of kubectl get secret |
16:56:00Z |
UserReconciler.Reconcile fires the next iteration | operator log |
16:56:17Z |
First poll-loop iteration that observes Redpanda accepting <ROTATED_PWD>; ~18 s after the K8s Secret update |
test script |
16:56:17Z |
rpk cluster info with the old password <INITIAL_PWD> returns SASL_AUTHENTICATION_FAILED: Invalid credentials |
test script |
User CR after rotation (Synced=True is preserved — the controller upserted the new SCRAM credential without flipping the condition, so lastTransitionTime stays at the Test 1 sync time):
status:
conditions:
- lastTransitionTime: "2026-05-01T16:54:44Z"
message: 'Successfully synced "appuser" to cluster.'
reason: Synced
status: "True"
type: Synced
managedAcls: true
managedUser: true
observedGeneration: 1Caveats / observations not specific to this PR
- Bootstrap-user
redpanda-usersSecret pre-mount (resolved by docs pattern). The previous comment hitMountVolume.SetUp failed for volume "users" : secret "redpanda-users" not found. That was caused by the chart'sauth.sasl.secretRefdefaulting to"redpanda-users"when the user enables SASL but doesn't override it (seecharts/redpanda/chart/values.yaml:164and theusersvolume incharts/redpanda/helpers.go:139-145, 218-227). Following the docs and pre-creating a Secret withauth.sasl.secretRef: <name>+users: []is the correct shape and removes the workaround entirely. - Bootstrap-user mechanism mismatch (resolved by aligning to SCRAM-SHA-256). The previous comment hit
SASL_AUTHENTICATION_FAILEDon the User controller's Kafka path because the User CR wastype: scram-sha-512whileBootstrapUser.GetMechanism()defaults toSCRAM-SHA-256(charts/redpanda/values.go:1410-1415). Withtype: scram-sha-256, both the chart's bootstrap user and the User CR are on a single mechanism and the Kafka path stays clean. - Reconcile cadence with
syncCredentials: trueis much tighter than expected. With only one User CR in the cluster,UserReconciler.Reconcileran ~39 times/min (≈ once every 1.5 s) for the entire run. The framework's periodic timer is set to 5 min (PeriodicallyReconcile(5 * time.Minute)), so this isn't the periodic loop — it looks like the always-upsert path undersyncCredentials: trueis keeping itself enqueued, possibly via the new Secret watch reacting to a self-write or via the apply patch dirtying the resource version each cycle. The credential rotation in Test 2 still propagated correctly, but at this rate the Kafka admin path is being exercised every ~1.5 s per User CR. Resolved upstream and rebased afterwards.
Cleanup
az group delete --name claude-pr1438-retest-<redacted> --yes --no-wait(Done — RG delete initiated before posting this update.)
🤖 Generated with Claude Code
…naged secrets Fixes #1354. The User controller previously only created SCRAM credentials when the user did not already exist in Redpanda, which meant applying a User CR for a pre-existing user left it permanently unmanaged (status.managedUser=false). This also meant password rotation via Secret updates was never reconciled. Phase 1 — Adoption: Remove the !hasUser gate so that UpsertSCRAM (which is idempotent) handles both new and existing users whenever spec.authentication is declared. Phase 2 — Credential sync: Add spec.authentication.syncCredentials (opt-in bool). When enabled, each reconciliation cycle re-reads the password from the referenced Secret and upserts it to Redpanda, enabling external rotation via ESO or similar tools. Phase 3 — Secret watch: Index Users by their referenced password Secret and add a Watches handler so that external Secret changes (e.g. from ESO) trigger immediate reconciliation instead of waiting for the 5-minute periodic cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…dentials field Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
22b3a27 to
044cf76
Compare
Rebased onto main — picks up the hot-reconcile-loop fixRebased The ~39 reconciles/min cadence I flagged in the previous comment is fixed by #1460 ( Verified locally:
|
go-licenses fetches license URLs at generate time by following the gopkg.in
go-import meta redirect to GitHub. The fetcher's per-URL timeout is short, so
on slow networks (and in CI) some or all of the five gopkg.in/* deps degrade
to "Unknown" and `git diff --exit-code` fails the lint step.
These five deps are stable and rarely bumped. Hardcode the URL pattern in the
template using {{ .Version }} so generation is deterministic regardless of
whether the network call succeeds.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Ready for review tested end to end |
Closes #1354
Summary
The User controller has two related issues that prevent importing existing Redpanda users and block credential rotation via external secret management systems (like ESO + Azure Key Vault):
SyncResourceonly callsCreatewhen!hasUser, so applying a User CR for a pre-existing Redpanda user leaves it permanently atstatus.managedUser=falseThis PR fixes both issues across three phases:
Phase 1: Fix user adoption
Changed:
SyncResourcebranching logic inuser_controller.goThe old code had a
!hasUser && shouldManageUsergate that prevented adoption of existing users. The fix changes this toshouldManageUser && !hasManagedUser, which triggers theCreate/upsert path regardless of whether the user already exists in Redpanda. This works because the underlyingAlterUserSCRAMswithUpsertSCRAMis idempotent — it handles both create and update.No opt-in needed — declaring
spec.authenticationis already the signal that the operator should manage the user.Phase 2: Ongoing credential sync (opt-in)
New field:
spec.authentication.syncCredentials(bool, defaultfalse)When enabled, each reconciliation cycle re-reads the password from the referenced Secret and upserts credentials to Redpanda. This enables password rotation via external systems like ESO.
New method:
users.Client.Update()— reads the current password from the Secret viaPassword.Fetch()(no generation or Secret creation) and upserts to Redpanda.Phase 3: Immediate reconciliation on Secret changes
New index: Users are indexed by the password Secret they reference (
spec.authentication.password.valueFrom.secretKeyRef.name)New watch: A
Watches(&corev1.Secret{}, ...)handler maps Secret changes to referencing User CRs and enqueues them. This means ESO-driven Secret updates trigger immediate reconciliation instead of waiting up to 5 minutes.How to migrate existing users to operator management
Step-by-step: ESO + Azure Key Vault workflow
Ensure credentials exist in Azure Key Vault — the username/password pair must already be provisioned
Configure ESO to sync to a K8s Secret:
Apply the User CR referencing the ESO-managed Secret:
Verify adoption:
To rotate credentials: Update the secret in Azure Key Vault. ESO will sync the new value to the K8s Secret, the operator will detect the Secret change (via the new watch) and immediately reconcile, pushing the new password to Redpanda.
Step-by-step: Manual migration (no ESO)
Create a K8s Secret with the current password:
kubectl create secret generic my-user-password \ --from-literal=password='current-password'Apply the User CR (same as step 3 above, but
syncCredentialsis optional since you're managing the Secret manually)To rotate: Update the Secret, then either wait for the 5-minute periodic reconcile or trigger a manual reconcile by annotating the User CR
Key flags
password.noGenerate: trueauthentication.syncCredentials: trueFiles changed
operator/api/redpanda/v1alpha2/user_types.go— addSyncCredentialsfield,ShouldSyncCredentials(),GetPasswordSecretName()helpersoperator/internal/controller/redpanda/user_controller.go— fix adoption logic, add credential sync branch, add Secret index + watchoperator/pkg/client/users/client.go— addUpdate()methodoperator/internal/controller/redpanda/user_controller_test.go— addTestUserAdoptExistingandTestUserCredentialSynctestsTODO
make generateto regenerate apply configurations for the newSyncCredentialsfieldTest plan
TestUserAdoptExisting— pre-creates a user via Kafka admin API, then applies a User CR and verifiesmanagedUser=trueand the new password worksTestUserCredentialSync— creates a user withsyncCredentials: true, rotates the password Secret, reconciles, and verifies the new password authenticatesTestUserReconciletable tests continue to pass (adoption case added to table)🤖 Generated with Claude Code