diff --git a/README.md b/README.md index b9137c5..ee64ffe 100644 --- a/README.md +++ b/README.md @@ -7,83 +7,43 @@ - Haixin Liu - Hanghai Li -## Project Description +## Project Overview -This project analyzes MTA Daily Ridership Data to examine COVID-19 recovery patterns across different transit modes in New York City. We explore how subway, bus, and commuter rail ridership has changed over time and compare the recovery rates of different transportation methods. +This project analyzes MTA daily ridership trends in New York City to understand how different transit services have recovered since COVID-19. Our Streamlit dashboard compares subway, bus, LIRR, and Metro-North ridership over time and uses NYC COVID-19 case data as additional context. ## Research Questions -1. How has MTA ridership recovered since COVID-19 across different transit modes? -2. Which transit modes have recovered faster - subway, bus, or commuter rail? -3. Are there seasonal patterns in the ridership recovery? +1. How have different MTA services recovered since COVID-19? +2. How do weekday and weekend ridership patterns differ? +3. How do changes in COVID-19 cases relate to changes in transit ridership? -## Dataset +## Data Sources -- **Source:** [MTA Daily Ridership Data](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew) -- **Updated:** Daily +- **MTA Daily Ridership Data** + https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew -## Setup - -### 1. Clone and install - -```bash -git clone https://github.com/advanced-computing/bouncing-penguin.git -cd bouncing-penguin -python -m venv .venv -source .venv/bin/activate # Windows: .venv\Scripts\activate -pip install -r requirements.txt -``` - -### 2. Configure BigQuery credentials - -The app reads MTA data from BigQuery. You need a service account key to connect. - -1. Get the service account key JSON for `streamlit@sipa-adv-c-bouncing-penguin.iam.gserviceaccount.com` from your team or GCP Console (IAM & Admin > Service Accounts > Keys). -2. Create the secrets file: - -```bash -mkdir -p .streamlit -``` - -3. Create `.streamlit/secrets.toml` with the following structure, filling in values from the JSON key: +- **NYC COVID-19 Daily Cases** + https://data.cityofnewyork.us/Health/Coronavirus-Data/rc75-m7u3 -```toml -[gcp_service_account] -type = "service_account" -project_id = "sipa-adv-c-bouncing-penguin" -private_key_id = "" -client_email = "streamlit@sipa-adv-c-bouncing-penguin.iam.gserviceaccount.com" -client_id = "" -auth_uri = "https://accounts.google.com/o/oauth2/auth" -token_uri = "https://oauth2.googleapis.com/token" -private_key = "" -``` +## Repository Structure -### 3. Load data into BigQuery (optional) +- `streamlit_app.py` - homepage and project introduction +- `pages/1_MTA_Ridership.py` - main MTA ridership analysis +- `pages/2_Second_Dataset.py` - NYC COVID-19 context page +- `utils.py` - helper functions for cleaning and plotting +- `validation.py` - Pandera schema validation +- `tests/` - unit tests for utility and validation code +- `load_data_to_bq.py` - script for loading data into BigQuery -If the BigQuery table doesn't exist yet, run the data loading script: - -```bash -python load_data_to_bq.py -``` - -This fetches MTA ridership data from the NYC Open Data API and uploads it to BigQuery. You will be prompted to authenticate with your Google account. - -### 4. Run the app - -```bash -streamlit run streamlit_app.py -``` - -The app will open at `http://localhost:8501`. - -## Live App +## Setup -[bouncing-penguin-forever.streamlit.app](https://bouncing-penguin-forever.streamlit.app) +1. Clone this repository: `git clone https://github.com/advanced-computing/bouncing-penguin.git` +2. Create virtual environment: `python -m venv .venv` +3. Activate virtual environment: + - Mac/Linux: `source .venv/bin/activate` + - Windows: `.venv\Scripts\activate` +4. Install dependencies: `pip install -r requirements.txt` ## Usage -- **Dashboard tab**: Interactive visualizations of MTA ridership recovery trends, weekday vs weekend comparisons, holiday impacts, and year-over-year analysis. -- **Proposal tab**: Project background, research questions, methodology, and preliminary findings. -- **MTA Ridership page**: Simplified ridership charts. -- **Second Dataset page**: NYC COVID-19 case data for context. +Open `mta_ridership_project.ipynb` in Jupyter Notebook or VS Code to run the analysis.