Skip to content

LuisJG8/github_etl

Repository files navigation

The goal of this project is to gather and process all GitHub repository data by making API calls to the Github API and then processing all the data using a queue system. End result example: https://console.cloud.google.com/marketplace/product/github/github-repos?project=hopeful-host-433510-a3

The data would be used to build an opensearch app where you can search any repo and see the data.

The data would also be used to train an ML model so that users can learn any repository by using an LLM as guidance.

About

Batch ETL pipeline processing 3TB+ of GitHub repository data. Using Celery and RabbitMQ for the queue system that processes all data.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors