Add celery and celery-beat to delete partial files every 48 hours #234
Add celery and celery-beat to delete partial files every 48 hours #234paigewilliams merged 23 commits intodevelopfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR introduces Celery (worker + beat) into the TEKDB Django project to periodically delete expired resumable-upload chunk files (intended every 48 hours), using Redis as the broker and django-celery-results / django-celery-beat for admin visibility and scheduling control.
Changes:
- Add Celery app/task (
delete_expired_chunks) and schedule it via Django settings; add a basic test for the cleanup behavior. - Add Docker compose services for
redis,celery, andcelery-beat, plus a web healthcheck used for startup ordering. - Add Vagrant/systemd provisioning artifacts to run Redis + Celery worker/beat as services.
Reviewed changes
Copilot reviewed 14 out of 16 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| docker/docker-compose.yaml | Adds redis/celery/celery-beat services to dev compose and wires broker env. |
| docker/docker-compose.prod.local.yaml | Adds redis/celery/celery-beat services to prod-local compose. |
| docker/common.yaml | Defines redis/celery/celery-beat service templates and adds a web healthcheck. |
| deployment/celery-worker.service | New systemd unit for celery worker in Vagrant. |
| deployment/celery-beat.service | New systemd unit for celery beat (DatabaseScheduler) in Vagrant. |
| TEKDB/scripts/provision_celery.sh | Provision script to install Redis and enable/start celery systemd units. |
| TEKDB/requirements.txt | Adds celery, redis, django-celery-results, django-celery-beat dependencies. |
| TEKDB/media/init.py | Adds placeholder file under media directory. |
| TEKDB/TEKDB/tests/test_tasks.py | Adds tests for chunk cleanup behavior. |
| TEKDB/TEKDB/tekdb_filebrowser.py | Minor lambda formatting adjustment (parentheses). |
| TEKDB/TEKDB/tasks.py | Adds delete_expired_chunks Celery task and helper formatting function. |
| TEKDB/TEKDB/settings.py | Registers celery results/beat apps and adds Celery configuration + beat schedule. |
| TEKDB/TEKDB/celery.py | Adds Celery app initialization and autodiscovery. |
| TEKDB/TEKDB/admin.py | Adjusts django-celery-results admin registration (unregister GroupResult). |
| TEKDB/TEKDB/init.py | Imports Celery app on Django startup. |
| TEKDB/TEKDB/management/commands/delete_expired_chunks.py | Adds a management command module file (currently empty). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
|
|
There was a problem hiding this comment.
bytes_to_readable can return None for very large values because the loop has no fallback return after the last unit. Ensure the function always returns a string (e.g., add a final return for the largest unit).
| # Fallback: for extremely large values, use the largest unit. | |
| return f"{num_bytes:.2f} P{suffix}" |
| redis: | ||
| extends: | ||
| file: common.yaml | ||
| service: redis | ||
| celery: | ||
| extends: | ||
| file: common.yaml | ||
| service: celery | ||
| celery-beat: | ||
| extends: | ||
| file: common.yaml | ||
| service: celery-beat |
There was a problem hiding this comment.
PR description indicates celery/beat should be part of the Docker deployment to run the 48h cleanup, but only docker-compose.yaml (dev) and docker-compose.prod.local.yaml include the redis/celery services. The actual deployed compose file (docker/docker-compose.prod.yaml, used in .github/workflows/deploy-ec2-staging.yml) currently does not include redis/celery/celery-beat, so the cleanup task won’t run in that environment.
| path, | ||
| filter_func=lambda fo: not fo.has_media_record() | ||
| and fo.filename not in self.files_folders_to_ignore(), | ||
| filter_func=lambda fo: ( |
There was a problem hiding this comment.
changes to this file are just ruff formatting fixes
| @@ -0,0 +1,18 @@ | |||
| [Unit] | |||
There was a problem hiding this comment.
pulled much of this from these docs: https://docs.celeryq.dev/en/stable/userguide/daemonizing.html#daemonization
| RUN chmod +x /usr/local/bin/entrypoint.sh | ||
|
|
||
| # Create a non-root user and give it ownership of the app directory | ||
| RUN addgroup --system appgroup && adduser --system --ingroup appgroup appuser \ |
There was a problem hiding this comment.
Added to resolve warning of SecurityWarning: You're running the worker with superuser privileges: this is absolutely not recommended! in the celery container.
| volumes: | ||
| - static_volume:/vol/static/static:ro | ||
| - media_volume:/vol/static/media:ro | ||
| redis: |
There was a problem hiding this comment.
I like that it is possible, and so easy, to extend into other yaml files.
There was a problem hiding this comment.
agreed! makes it super easy to have different configs for different environments, but keep things DRY.
rhodges
left a comment
There was a problem hiding this comment.
Not only is it a great new feature, but this makes an easy job of including several very powerful tools that will be welcome when adding features in the future! I am curious how this will impact hardware usage, but in past experience Redis and Celery have proven themselves to be quite performant.
| context: ../TEKDB/ | ||
| dockerfile: ../TEKDB/Dockerfile | ||
| restart: unless-stopped | ||
| depends_on: |
There was a problem hiding this comment.
Why was this dependency removed?
There was a problem hiding this comment.
ah, I see the inheritance now - cool!
| file_path = os.path.join(root, filename) | ||
| try: | ||
| mtime = os.path.getmtime(file_path) | ||
| if mtime >= cutoff_timestamp: |
There was a problem hiding this comment.
This is the one piece of logic I most wanted to see in this review, and not only is it here, it's clear, readable, and well-written!
asana task
Adds celery, celery-beat and django-celery-results, django-celery-beat to delete partial uploads every 48 hours, with the goal of not bloating servers with partial uploads. Deletion logic exists in the
delete_expired_chunkstask. Uses redis as the message broker.django-celery-resultsdisplays the task results in the admin interface. This should help with transparency and observability.django-celery-beatalso displays the intervals at which tasks run. The thinking here is that it allows tribes to have more control over background tasks, and if they want them or not.For running with docker, adds a celery worker, celery-beat and redis container. For running with Vagrant it runs daemonized. The only step to add in the install instructions is
sudo bash /usr/local/apps/TEKDB/TEKDB/scripts/provision_celery.shto run after database migrations.Screenshots