Reject polls from recently-shutdown workers to prevent task theft#9545
Merged
Reject polls from recently-shutdown workers to prevent task theft#9545
Conversation
4c2c331 to
e23223e
Compare
When ShutdownWorker cancels a worker's polls via CancelOutstandingWorkerPolls, the SDK's graceful shutdown path may re-poll before fully stopping. This zombie re-poll can sync-match with retry tasks (e.g., activity retries dispatched by the timer queue), which the dying worker silently drops — causing the task to sit until timeout. Add a TTL cache of recently-shutdown WorkerInstanceKeys to the matching engine. Polls arriving from workers in this cache are rejected immediately with an empty response, preventing zombie re-polls from stealing tasks. Made-with: Cursor
e23223e to
097785c
Compare
Made-with: Cursor
dnr
reviewed
Mar 19, 2026
The shutdown rejection is independent of pollerID tracking and should run unconditionally based on workerInstanceKey. Made-with: Cursor
dnr
approved these changes
Mar 21, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changed?
Add a TTL cache of recently-shutdown
WorkerInstanceKeys to the matching engine. WhenCancelOutstandingWorkerPollsis called duringShutdownWorker, the worker's key is recorded in this cache. Subsequent polls carrying that key are rejected immediately with an empty response.Why?
When
ShutdownWorkercancels a worker's polls, the SDK's graceful shutdown path may re-poll before fully stopping. This zombie re-poll can sync-match with retry tasks (e.g., activity retries dispatched by the timer queue), which the dying worker silently drops — causing the task to sit until timeout. The cache prevents these zombie polls from being matched with real tasks.How did you test it?
Potential risks
cancelOutstandingWorkerPollsfan-out which covers all partitions. If partition count changes between cancellation and re-poll, a node that wasn't fanned-out to won't have the cache entry. This is an unlikely edge case during a shutdown sequence.WorkerInstanceKey(new SDK versions). No impact on existing SDK versions.Made with Cursor