Skip to content

batalabs/strongmind-github-events

Repository files navigation

GitHub Events Ingestion Service

A Dockerized Rails service that ingests GitHub Push events, stores structured and raw data in PostgreSQL, and enriches events with actor and repository metadata.

Prerequisites

  • Docker and Docker Compose

Quick Start

1. Configure environment

cp .env.example .env

Credentials are prefilled and ready to use. Edit .env if you need different values.

2. Build and start the system

docker compose up --build

This starts PostgreSQL, Valkey, and a Sidekiq worker. Ingestion begins automatically within ~30 seconds — no additional commands needed. The worker runs hourly at minute 5 of each hour thereafter.

3. Run ingestion manually (optional)

docker compose run --rm ingest

For a one off run outside the automatic schedule. Safe to run at any time — idempotent.

4. Run tests

docker compose run --rm test

The test container creates and migrates its database automatically before running specs.

How to Verify It's Working

Expected Logs

Watch the worker with docker compose logs -f worker:

[Worker] Starting ingestion run
[Ingestion] Fetching events from GitHub API...
[Ingestion] Found X PushEvents out of Y total events
[Ingestion] Saved PushEvent 12345 from user/repo
...
[Ingestion] Complete: X created, 0 duplicates skipped, 0 failed
[Worker] Ingestion: X created, 0 skipped, 0 failed
[Enrichment] N events need enrichment
[Enrichment] Fetching actor 583231 from https://api.github.com/users/...
[Enrichment] Fetching repository 123456 from https://api.github.com/repos/...
[Enrichment] Complete
[Worker] Run complete. Total PushEvents: X

Or all services at once: docker compose logs -f

Check the database

docker compose exec worker bundle exec rails runner "puts 'PushEvents: ' + PushEvent.count.to_s; puts 'Actors: ' + Actor.count.to_s; puts 'Repositories: ' + Repository.count.to_s"

Database tables

Table Purpose
push_events Core event data with queryable columns (repo_github_id, push_id, ref, head, before) + raw JSON payload
actors Enriched GitHub user data (login, avatar, profile)
repositories Enriched GitHub repository data (name, description, language)

Time to results

  • Build: ~2-3 minutes (first time, pulls images + installs gems)
  • First automatic ingestion: ~30 seconds after docker compose up --build
  • Subsequent automatic runs: Every hour at minute 5 (5 * * * *)
  • Manual ingestion: ~10-30 seconds via docker compose run --rm ingest
  • Tests: ~5-10 seconds

Architecture

Architecture

Rails API only app (no web UI) with three service objects: GithubApiClient handles HTTP and rate limit tracking, EventIngestionService filters and persists PushEvents, and EnrichmentService fetches actor and repository data. A Sidekiq worker orchestrates the pipeline on a schedule, backed by Valkey as the job queue and PostgreSQL as the system of record.

See docs/design_brief.md for full architecture details, tradeoffs, and rate limit strategy.

Running ingestion multiple times

The service is idempotent — running docker compose run --rm ingest again will skip already seen events and already enriched actors/repositories. It's safe to run repeatedly.

Rate Limits

The GitHub API allows 60 unauthenticated requests per hour. The service tracks remaining requests and stops enrichment when the limit is reached. Events are always saved regardless of enrichment status.

Stopping the system

docker compose down

To also remove the database volume:

docker compose down -v

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors