Skip to content

Replica: fix(webhooks): add scheduled task to unlock stale webhook request locks#67

Open
lucaforni wants to merge 3 commits intomain-modalsourcefrom
cymulatereouven-postal-fix/tidy-stale-webhook-request-locks
Open

Replica: fix(webhooks): add scheduled task to unlock stale webhook request locks#67
lucaforni wants to merge 3 commits intomain-modalsourcefrom
cymulatereouven-postal-fix/tidy-stale-webhook-request-locks

Conversation

@lucaforni
Copy link
Copy Markdown

Questa PR replica la PR originale: postalserver#3546

Autore originale: @cymulatereouven
Branch originale: fix/tidy-stale-webhook-request-locks
Repository originale: cymulatereouven/postal


Summary

  • Problem: When a worker process crashes or is killed (common in Kubernetes with rolling updates, OOM kills, pod evictions), locks held on webhook_requests rows are never released. Unlike QueuedMessage which has TidyQueuedMessagesTask to clean up stale locks, WebhookRequest had no equivalent cleanup mechanism — causing webhook delivery to be permanently blocked.
  • Fix: Adds a TidyWebhookRequestsTask scheduled task (runs hourly) that finds webhook requests locked for more than 1 hour and unlocks them so they can be retried.
  • Adds a with_stale_lock scope on WebhookRequest (mirrors the existing pattern on QueuedMessage)

Changes

File Change
app/models/webhook_request.rb Added with_stale_lock scope (locks older than 1 hour)
app/scheduled_tasks/tidy_webhook_requests_task.rb New scheduled task to unlock stale webhook requests
app/lib/worker/process.rb Registered TidyWebhookRequestsTask in the worker TASKS array

Why unlock instead of destroy?

TidyQueuedMessagesTask destroys stale queued messages because they represent outbound email delivery attempts that are likely no longer relevant. Webhook requests, however, carry event notifications that downstream systems may still need — so we unlock them to allow retry rather than silently dropping them.

Test plan

  • Verify WebhookRequest.with_stale_lock returns only records with locked_at older than 1 hour
  • Verify TidyWebhookRequestsTask unlocks stale records (sets locked_by and locked_at to nil)
  • Verify previously-stuck webhook requests are picked up again by ProcessWebhookRequestsJob after unlock
  • Confirm no impact on actively-locked (non-stale) webhook requests

🤖 Generated with Claude Code

adamcooke and others added 3 commits February 1, 2026 14:48
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
When a worker process crashes or is killed (common in Kubernetes with
rolling updates, OOM kills, etc.), locks held on webhook_requests are
never released. Unlike QueuedMessage which has TidyQueuedMessagesTask
to clean up stale locks, WebhookRequest had no equivalent mechanism,
causing webhook delivery to be permanently blocked.

This adds:
- A `with_stale_lock` scope on WebhookRequest (locks older than 1 hour)
- A TidyWebhookRequestsTask scheduled task that runs hourly to unlock
  stale webhook requests so they can be retried

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants