A Dockerized Rails service that ingests GitHub Push events, stores structured and raw data in PostgreSQL, and enriches events with actor and repository metadata.
- Docker and Docker Compose
cp .env.example .envCredentials are prefilled and ready to use. Edit .env if you need different values.
docker compose up --buildThis starts PostgreSQL, Valkey, and a Sidekiq worker. Ingestion begins automatically within ~30 seconds — no additional commands needed. The worker runs hourly at minute 5 of each hour thereafter.
docker compose run --rm ingestFor a one off run outside the automatic schedule. Safe to run at any time — idempotent.
docker compose run --rm testThe test container creates and migrates its database automatically before running specs.
Watch the worker with docker compose logs -f worker:
[Worker] Starting ingestion run
[Ingestion] Fetching events from GitHub API...
[Ingestion] Found X PushEvents out of Y total events
[Ingestion] Saved PushEvent 12345 from user/repo
...
[Ingestion] Complete: X created, 0 duplicates skipped, 0 failed
[Worker] Ingestion: X created, 0 skipped, 0 failed
[Enrichment] N events need enrichment
[Enrichment] Fetching actor 583231 from https://api.github.com/users/...
[Enrichment] Fetching repository 123456 from https://api.github.com/repos/...
[Enrichment] Complete
[Worker] Run complete. Total PushEvents: X
Or all services at once: docker compose logs -f
docker compose exec worker bundle exec rails runner "puts 'PushEvents: ' + PushEvent.count.to_s; puts 'Actors: ' + Actor.count.to_s; puts 'Repositories: ' + Repository.count.to_s"| Table | Purpose |
|---|---|
push_events |
Core event data with queryable columns (repo_github_id, push_id, ref, head, before) + raw JSON payload |
actors |
Enriched GitHub user data (login, avatar, profile) |
repositories |
Enriched GitHub repository data (name, description, language) |
- Build: ~2-3 minutes (first time, pulls images + installs gems)
- First automatic ingestion: ~30 seconds after
docker compose up --build - Subsequent automatic runs: Every hour at minute 5 (
5 * * * *) - Manual ingestion: ~10-30 seconds via
docker compose run --rm ingest - Tests: ~5-10 seconds
Rails API only app (no web UI) with three service objects: GithubApiClient handles HTTP and rate limit tracking, EventIngestionService filters and persists PushEvents, and EnrichmentService fetches actor and repository data. A Sidekiq worker orchestrates the pipeline on a schedule, backed by Valkey as the job queue and PostgreSQL as the system of record.
See docs/design_brief.md for full architecture details, tradeoffs, and rate limit strategy.
The service is idempotent — running docker compose run --rm ingest again will skip already seen events and already enriched actors/repositories. It's safe to run repeatedly.
The GitHub API allows 60 unauthenticated requests per hour. The service tracks remaining requests and stops enrichment when the limit is reached. Events are always saved regardless of enrichment status.
docker compose downTo also remove the database volume:
docker compose down -v