Skip to content

Fix upgrade deadlock: move authkey creation to init container#26

Merged
kervel merged 1 commit intomainfrom
fix/authkey-init-container
Feb 26, 2026
Merged

Fix upgrade deadlock: move authkey creation to init container#26
kervel merged 1 commit intomainfrom
fix/authkey-init-container

Conversation

@kervel
Copy link
Copy Markdown
Contributor

@kervel kervel commented Feb 26, 2026

Summary

  • Replaced the Helm post-install/post-upgrade hook Job for client authkey creation with an init container on the client pod, eliminating the chicken-and-egg deadlock during upgrades
  • Extended ensure-client-key.sh with dual-mode support (AUTH_FILE_PATH env var) so the same script serves both the init container and the optional CronJob
  • Added pods/exec RBAC permissions to the client ServiceAccount for the init container to interact with the headscale server pod

Problem

When upgrading the chart, Helm waits for all pods to become ready before running post-hooks. The client pod needs a valid authkey to become ready, but the authkey is created by the post-upgrade hook — a classic deadlock. If the existing authkey Secret has expired (e.g. old 1h default), the upgrade times out and rolls back.

Approach

Inspired by #7 by @tekulvw, which proposed moving authkey creation to an init container. This PR takes that core idea and integrates it with the existing idempotent key management script (reusing valid keys, long-lived expiry, CronJob compatibility) rather than replacing it.

Credits to @tekulvw for the init container approach in #7.

What changed

File Change
templates/create-client-secret-job.yaml Deleted — the Helm hook causing the deadlock
templates/client-deployment.yaml Added ensure-authkey init container, replaced Secret volume with emptyDir
templates/client-job-script-configmap.yaml Added AUTH_FILE_PATH dual-mode support
templates/client-rbac.yaml Added pods get/list/watch + pods/exec create
hack/kind-smoke.sh Updated tests for init container instead of hook Job
Chart.yaml Version bump to 0.2.0

Upgrade path

  • Existing client-authkey Secret persists; init container reuses valid keys
  • Old hook Job is auto-cleaned by its hook-delete-policy
  • No manual intervention needed

Test plan

  • helm lint headscale/ passes
  • hack/kind-smoke.sh --with-client passes (single-node, client deployment)
  • hack/kind-smoke.sh --with-client-daemonset passes (multi-node, 3-node kind cluster, client daemonset)
  • Verify init container completes: kubectl get pod -l app.kubernetes.io/component=client -o jsonpath='{.items[0].status.initContainerStatuses[?(@.name=="ensure-authkey")].state}'
  • Verify no post-install/post-upgrade hooks in rendered templates

🤖 Generated with Claude Code

…alysis

Move authkey provisioning from a Helm post-install/post-upgrade Job to an
init container on the client pod, eliminating race conditions and simplifying
the lifecycle. Add RBAC for pods/exec needed by the init container. Add a
risk analysis checklist to the DaemonSet mode README section covering subnet
overlap, recovery planning, and acceptDns.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@kervel kervel merged commit dbf1c8e into main Feb 26, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant