fix: clean up Temporal server-side versioning data on TWD deletion#240
fix: clean up Temporal server-side versioning data on TWD deletion#240anujagrawal380 wants to merge 6 commits intotemporalio:mainfrom
Conversation
|
PTAL @carlydf |
jaypipes
left a comment
There was a problem hiding this comment.
@anujagrawal380 awesome contribution, thank you so much for this PR! I couple really minor comments below, but overall excellent work.
Thanks, resolved both the comments! |
jaypipes
left a comment
There was a problem hiding this comment.
rock on :) nice work on this @anujagrawal380!
carlydf
left a comment
There was a problem hiding this comment.
we need integration tests for this before merging to main / including it in a release
@carlydf Added the integration tests. PTAL |
|
Hi @anujagrawal380 , could you fix the linters! Would love to include this in our next release |
…blocking unversioned workers Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
Signed-off-by: Anuj Agrawal <anujagrawal380@gmail.com>
22607bf to
5a2e844
Compare
…ion finalizer - Add 5-minute deletionCleanupTimeout to prevent TWD stuck in Terminating state indefinitely if Temporal server is unavailable - Return errors from version/deployment deletion to trigger requeue until versions actually clear (pollers disappear as pods terminate) - Add update/patch verbs and finalizers RBAC marker for TemporalConnections - Fix comment-spacing lint on new kubebuilder:rbac markers
d6a305c to
9fd0c74
Compare
@carlydf @jaypipes Added few more minor improvements here: 9fd0c74 . PTAL! |
TemporalWorkerDeploymentto run Temporal server-side cleanup before K8s deletionTemporalConnectionto prevent it from being deleted while any TWD still references itProblem
When a
TemporalWorkerDeploymentCRD is deleted (e.g., switching back to plain Deployments), the Temporal server retains the build ID routing configuration. The matching service continues routing new tasks to the deleted build ID's physical queue, while unversioned workers poll a different physical queue. Tasks sit inScheduledstate indefinitely with no errors.A secondary race condition exists: Helm deletes both the
TemporalConnectionandTWDin the same upgrade. Without the connection, the controller cannot talk to Temporal to clean up. This is solved by adding a finalizer to theTemporalConnectionthat blocks its deletion until all referencing TWDs are gone.Changes
internal/controller/worker_controller.go:TWD finalizer (
temporal.io/worker-deployment-cleanup):handleDeletion()which:BuildID: "") -- the critical step that unblocks task dispatchSkipDrainage: trueTemporalConnection finalizer (
temporal.io/connection-in-use):TemporalConnectionduring normal TWD reconciliation viaensureConnectionFinalizer()removeConnectionFinalizerIfUnused()during TWD deletion, after checking no other TWDs in the same namespace reference the connectionRBAC updates:
update;patchverbs fortemporalconnections(wasget;list;watch)updateverb fortemporalconnections/finalizersDeletion flow
Issue #55
Closes #166