Skip to content

feat: async user deletion via background worker#3798

Merged
idoshamun merged 5 commits intomainfrom
feat/async-user-deletion
Apr 15, 2026
Merged

feat: async user deletion via background worker#3798
idoshamun merged 5 commits intomainfrom
feat/async-user-deletion

Conversation

@idoshamun
Copy link
Copy Markdown
Member

Summary

  • Replace synchronous user deletion (CASCADE-locked 60 FK tables for 10-20min, blocking logins) with an async flow
  • deleteUser() now sets flags.inDeletion=true, invalidates sessions, and returns immediately
  • New userDeletionCleanup worker handles child table cleanup one table at a time
  • Boot endpoint blocks login for users marked inDeletion
  • Zombie cron marks users for deletion instead of deleting directly

Test plan

  • Build passes
  • Lint passes (0 warnings)
  • __tests__/users.ts delete tests pass (19/19)
  • __tests__/cron/cleanZombieUsers.ts passes (3/3)
  • Companion PR: dailydotdev/streams#93 (topic must be deployed first)
  • Follow-up: migration to remove CASCADE from user FK constraints

Replace synchronous user deletion (which CASCADE-locked 60 FK tables
for 10-20 minutes) with an async flow:

1. deleteUser() now sets flags.inDeletion=true and invalidates sessions
2. CDC detects the flag change and triggers api.v1.user-deletion-requested
3. A new worker cleans up child tables one at a time, then hard-deletes
4. Boot endpoint blocks login for users marked inDeletion
5. Zombie cron marks users for deletion instead of deleting directly
@pulumi
Copy link
Copy Markdown

pulumi bot commented Apr 15, 2026

🍹 The Update (preview) for dailydotdev/api/prod (at 5cd6dc0) was successful.

✨ Neo Explanation

Routine deployment of commit `d4f56edb` across all API services and cron jobs, with DB and ClickHouse migrations running as new one-shot Jobs, plus a new Pub/Sub subscription for user deletion cleanup. ✅ Low Risk

This is a standard application deployment rolling out a new image version (426a0fdbd4f56edb) across all services. The migration Jobs are replaced by design — their logical names include the commit hash, so old Jobs are deleted and new ones are created for the incoming version. Both a TypeORM DB migration and a ClickHouse migration will run against the new image before (or alongside) the updated deployments going live.

🔵 Info — A new GCP Pub/Sub subscription api-sub-api.user-deletion-cleanup is being created, indicating this release adds a new user deletion cleanup workflow that consumes from an existing topic.

🔵 Info — The migration Jobs run with restartPolicy: Never, so if either migration fails it will not retry automatically — check Job completion status in the cluster after deploy to confirm both migrations succeeded before treating the rollout as healthy.

Resource Changes

    Name                                                       Type                                  Operation
~   vpc-native-update-trending-cron                            kubernetes:batch/v1:CronJob           update
~   vpc-native-bg-deployment                                   kubernetes:apps/v1:Deployment         update
~   vpc-native-channel-digests-cron                            kubernetes:batch/v1:CronJob           update
~   vpc-native-post-analytics-history-day-clickhouse-cron      kubernetes:batch/v1:CronJob           update
~   vpc-native-sync-subscription-with-cio-cron                 kubernetes:batch/v1:CronJob           update
~   vpc-native-clean-zombie-images-cron                        kubernetes:batch/v1:CronJob           update
~   vpc-native-temporal-deployment                             kubernetes:apps/v1:Deployment         update
~   vpc-native-personalized-digest-cron                        kubernetes:batch/v1:CronJob           update
~   vpc-native-deployment                                      kubernetes:apps/v1:Deployment         update
~   vpc-native-clean-channel-highlights-cron                   kubernetes:batch/v1:CronJob           update
~   vpc-native-materialize-yearly-best-post-archives-cron      kubernetes:batch/v1:CronJob           update
+   vpc-native-api-db-migration-d4f56edb                       kubernetes:batch/v1:Job               create
~   vpc-native-rotate-weekly-quests-cron                       kubernetes:batch/v1:CronJob           update
~   vpc-native-user-profile-analytics-history-clickhouse-cron  kubernetes:batch/v1:CronJob           update
~   vpc-native-update-highlighted-views-cron                   kubernetes:batch/v1:CronJob           update
~   vpc-native-clean-stale-user-transactions-cron              kubernetes:batch/v1:CronJob           update
-   vpc-native-api-db-migration-426a0fdb                       kubernetes:batch/v1:Job               delete
~   vpc-native-post-analytics-clickhouse-cron                  kubernetes:batch/v1:CronJob           update
~   vpc-native-materialize-monthly-best-post-archives-cron     kubernetes:batch/v1:CronJob           update
~   vpc-native-calculate-top-readers-cron                      kubernetes:batch/v1:CronJob           update
~   vpc-native-channel-highlights-cron                         kubernetes:batch/v1:CronJob           update
~   vpc-native-update-achievement-rarity-cron                  kubernetes:batch/v1:CronJob           update
~   vpc-native-private-deployment                              kubernetes:apps/v1:Deployment         update
~   vpc-native-clean-expired-better-auth-sessions-cron         kubernetes:batch/v1:CronJob           update
~   vpc-native-clean-gifted-plus-cron                          kubernetes:batch/v1:CronJob           update
~   vpc-native-update-views-cron                               kubernetes:batch/v1:CronJob           update
~   vpc-native-user-posts-analytics-refresh-cron               kubernetes:batch/v1:CronJob           update
~   vpc-native-validate-active-users-cron                      kubernetes:batch/v1:CronJob           update
~   vpc-native-generate-search-invites-cron                    kubernetes:batch/v1:CronJob           update
~   vpc-native-hourly-notification-cron                        kubernetes:batch/v1:CronJob           update
-   vpc-native-api-clickhouse-migration-426a0fdb               kubernetes:batch/v1:Job               delete
~   vpc-native-check-analytics-report-cron                     kubernetes:batch/v1:CronJob           update
~   vpc-native-worker-job-deployment                           kubernetes:apps/v1:Deployment         update
~   vpc-native-update-source-public-threshold-cron             kubernetes:batch/v1:CronJob           update
~   vpc-native-expire-super-agent-trial-cron                   kubernetes:batch/v1:CronJob           update
~   vpc-native-personalized-digest-deployment                  kubernetes:apps/v1:Deployment         update
+   api-sub-api.user-deletion-cleanup                          gcp:pubsub/subscription:Subscription  create
~   vpc-native-user-profile-updated-sync-cron                  kubernetes:batch/v1:CronJob           update
~   vpc-native-update-tag-materialized-views-cron              kubernetes:batch/v1:CronJob           update
~   vpc-native-ws-deployment                                   kubernetes:apps/v1:Deployment         update
+   vpc-native-api-clickhouse-migration-d4f56edb               kubernetes:batch/v1:Job               create
... and 12 other changes

Copy link
Copy Markdown
Contributor

@rebelchris rebelchris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not still lock once it hits the cdc worker then?

Comment thread src/common/user.ts Outdated
}

// Delete user's resume if exists
await deleteResumeByUserId(userId);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not put these in the cleanup deletion as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yap it's possible and better

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — moved resume and employment agreement deletion to the worker. They're not time-sensitive and don't need to block the user-facing response.

Comment thread src/workers/userDeletionCleanup.ts Outdated
Comment thread src/workers/userDeletionCleanup.ts
Comment thread src/workers/userDeletionCleanup.ts
Comment thread src/workers/userDeletionCleanup.ts Outdated
await con
.getRepository(UserTransaction)
.update({ receiverId: userId }, { receiverId: ghostUser.id });
await con
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have a statement for article post and post? also we should set the author as ghost user, not null

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the separate ArticlePost line — the Post.update({ authorId: userId }, { authorId: ghostUser.id }) already handles all post types including articles. No more null, everything goes to ghost.

.getRepository(ArticlePost)
.update({ authorId: userId }, { authorId: null });
await con.getRepository(DigestPost).delete({ authorId: userId });
await con
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need this?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed. The SourceUser is already ghosted (userId → ghost), and the general Post.update({ authorId: userId }, { authorId: ghostUser.id }) catches all posts regardless of sourceId. The separate sourceId update was redundant.

@idoshamun
Copy link
Copy Markdown
Member Author

Does it not still lock once it hits the cdc worker then?

@rebelchris no, because it's not a transaction and not one statement. so you can acquire lock per table, rather than ~60 locks

- Move resume/employment deletion to worker (not time-sensitive)
- Add grantById ghost reassignment for ReputationEvent
- Remove redundant ArticlePost null update (Post update handles all)
- Remove redundant sourceId+authorId update (SourceUser already ghosted)
- Use log.debug for batch delete progress
Ensures zombie users (which bypass deleteUser) also get their
sessions cleaned up before the final user DELETE.
CDC test payloads pass flags as objects, Debezium sends them as JSON
strings. Handle both cases. Also fix zombie cron test expectations
to match the 4-user fixture.
@idoshamun idoshamun merged commit 51fa551 into main Apr 15, 2026
9 checks passed
@idoshamun idoshamun deleted the feat/async-user-deletion branch April 15, 2026 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants