feat: async user deletion via background worker#3798
Conversation
Replace synchronous user deletion (which CASCADE-locked 60 FK tables for 10-20 minutes) with an async flow: 1. deleteUser() now sets flags.inDeletion=true and invalidates sessions 2. CDC detects the flag change and triggers api.v1.user-deletion-requested 3. A new worker cleans up child tables one at a time, then hard-deletes 4. Boot endpoint blocks login for users marked inDeletion 5. Zombie cron marks users for deletion instead of deleting directly
|
🍹 The Update (preview) for dailydotdev/api/prod (at 5cd6dc0) was successful. ✨ Neo ExplanationRoutine deployment of commit `d4f56edb` across all API services and cron jobs, with DB and ClickHouse migrations running as new one-shot Jobs, plus a new Pub/Sub subscription for user deletion cleanup. ✅ Low RiskThis is a standard application deployment rolling out a new image version ( 🔵 Info — A new GCP Pub/Sub subscription 🔵 Info — The migration Jobs run with Resource Changes Name Type Operation
~ vpc-native-update-trending-cron kubernetes:batch/v1:CronJob update
~ vpc-native-bg-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-channel-digests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-post-analytics-history-day-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-sync-subscription-with-cio-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-zombie-images-cron kubernetes:batch/v1:CronJob update
~ vpc-native-temporal-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-personalized-digest-cron kubernetes:batch/v1:CronJob update
~ vpc-native-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-clean-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-materialize-yearly-best-post-archives-cron kubernetes:batch/v1:CronJob update
+ vpc-native-api-db-migration-d4f56edb kubernetes:batch/v1:Job create
~ vpc-native-rotate-weekly-quests-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-profile-analytics-history-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-highlighted-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-stale-user-transactions-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-db-migration-426a0fdb kubernetes:batch/v1:Job delete
~ vpc-native-post-analytics-clickhouse-cron kubernetes:batch/v1:CronJob update
~ vpc-native-materialize-monthly-best-post-archives-cron kubernetes:batch/v1:CronJob update
~ vpc-native-calculate-top-readers-cron kubernetes:batch/v1:CronJob update
~ vpc-native-channel-highlights-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-achievement-rarity-cron kubernetes:batch/v1:CronJob update
~ vpc-native-private-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-clean-expired-better-auth-sessions-cron kubernetes:batch/v1:CronJob update
~ vpc-native-clean-gifted-plus-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-user-posts-analytics-refresh-cron kubernetes:batch/v1:CronJob update
~ vpc-native-validate-active-users-cron kubernetes:batch/v1:CronJob update
~ vpc-native-generate-search-invites-cron kubernetes:batch/v1:CronJob update
~ vpc-native-hourly-notification-cron kubernetes:batch/v1:CronJob update
- vpc-native-api-clickhouse-migration-426a0fdb kubernetes:batch/v1:Job delete
~ vpc-native-check-analytics-report-cron kubernetes:batch/v1:CronJob update
~ vpc-native-worker-job-deployment kubernetes:apps/v1:Deployment update
~ vpc-native-update-source-public-threshold-cron kubernetes:batch/v1:CronJob update
~ vpc-native-expire-super-agent-trial-cron kubernetes:batch/v1:CronJob update
~ vpc-native-personalized-digest-deployment kubernetes:apps/v1:Deployment update
+ api-sub-api.user-deletion-cleanup gcp:pubsub/subscription:Subscription create
~ vpc-native-user-profile-updated-sync-cron kubernetes:batch/v1:CronJob update
~ vpc-native-update-tag-materialized-views-cron kubernetes:batch/v1:CronJob update
~ vpc-native-ws-deployment kubernetes:apps/v1:Deployment update
+ vpc-native-api-clickhouse-migration-d4f56edb kubernetes:batch/v1:Job create
... and 12 other changes |
rebelchris
left a comment
There was a problem hiding this comment.
Does it not still lock once it hits the cdc worker then?
| } | ||
|
|
||
| // Delete user's resume if exists | ||
| await deleteResumeByUserId(userId); |
There was a problem hiding this comment.
Why not put these in the cleanup deletion as well?
There was a problem hiding this comment.
yap it's possible and better
There was a problem hiding this comment.
Good point — moved resume and employment agreement deletion to the worker. They're not time-sensitive and don't need to block the user-facing response.
| await con | ||
| .getRepository(UserTransaction) | ||
| .update({ receiverId: userId }, { receiverId: ghostUser.id }); | ||
| await con |
There was a problem hiding this comment.
Why do we have a statement for article post and post? also we should set the author as ghost user, not null
There was a problem hiding this comment.
Removed the separate ArticlePost line — the Post.update({ authorId: userId }, { authorId: ghostUser.id }) already handles all post types including articles. No more null, everything goes to ghost.
| .getRepository(ArticlePost) | ||
| .update({ authorId: userId }, { authorId: null }); | ||
| await con.getRepository(DigestPost).delete({ authorId: userId }); | ||
| await con |
There was a problem hiding this comment.
Removed. The SourceUser is already ghosted (userId → ghost), and the general Post.update({ authorId: userId }, { authorId: ghostUser.id }) catches all posts regardless of sourceId. The separate sourceId update was redundant.
@rebelchris no, because it's not a transaction and not one statement. so you can acquire lock per table, rather than ~60 locks |
- Move resume/employment deletion to worker (not time-sensitive) - Add grantById ghost reassignment for ReputationEvent - Remove redundant ArticlePost null update (Post update handles all) - Remove redundant sourceId+authorId update (SourceUser already ghosted) - Use log.debug for batch delete progress
Ensures zombie users (which bypass deleteUser) also get their sessions cleaned up before the final user DELETE.
CDC test payloads pass flags as objects, Debezium sends them as JSON strings. Handle both cases. Also fix zombie cron test expectations to match the 4-user fixture.
Summary
deleteUser()now setsflags.inDeletion=true, invalidates sessions, and returns immediatelyuserDeletionCleanupworker handles child table cleanup one table at a timeinDeletionTest plan
__tests__/users.tsdelete tests pass (19/19)__tests__/cron/cleanZombieUsers.tspasses (3/3)