-
Notifications
You must be signed in to change notification settings - Fork 32
Description
When a TemporalWorkerDeployment (TWD) CRD is deleted from Kubernetes (e.g., switching from TWD back to a plain Deployment), the controller does not clean up the Temporal server-side deployment versioning data. The build ID routing configuration persists on the Temporal server indefinitely, causing new unversioned workers to be unable to pick up tasks.
Root Cause
The Reconcile method in worker_controller.go handles TWD deletion by simply returning on NotFound:
return ctrl.Result{}, client.IgnoreNotFound(err)No finalizer is used, so there is no opportunity to clean up Temporal server-side state before the CRD is garbage collected. The Temporal matching service continues to route new workflow tasks to the deleted build ID, while unversioned workers poll the _unversioned_ physical queue and receive nothing.
Impact
- New workflows scheduled on the affected task queue are stuck in
Scheduledstate indefinitely - Workers are running and polling but cannot pick up any tasks
- KEDA sees queued tasks and keeps workers scaled up, but they never process anything
- The issue is silent -- no errors in worker logs, just zero task pickup
- Every environment that ever had TWD enabled and switched back is potentially affected
Steps to Reproduce
- Deploy a worker using
TemporalWorkerDeployment(TWD enabled) -- the controller registers build IDs with the Temporal server - Disable TWD in Helm and redeploy -- Helm deletes the TWD CRD and creates a plain
Deployment - Trigger a workflow on the affected task queue
- Observe: the worker starts and polls, but the workflow task remains in
Scheduledstate
Expected Behavior
When a TWD CRD is deleted, the controller should clean up Temporal server-side versioning data so that unversioned workers can pick up tasks on the same task queue.