Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 26 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,83 +7,43 @@
- Haixin Liu
- Hanghai Li

## Project Description
## Project Overview

This project analyzes MTA Daily Ridership Data to examine COVID-19 recovery patterns across different transit modes in New York City. We explore how subway, bus, and commuter rail ridership has changed over time and compare the recovery rates of different transportation methods.
This project analyzes MTA daily ridership trends in New York City to understand how different transit services have recovered since COVID-19. Our Streamlit dashboard compares subway, bus, LIRR, and Metro-North ridership over time and uses NYC COVID-19 case data as additional context.

## Research Questions

1. How has MTA ridership recovered since COVID-19 across different transit modes?
2. Which transit modes have recovered faster - subway, bus, or commuter rail?
3. Are there seasonal patterns in the ridership recovery?
1. How have different MTA services recovered since COVID-19?
2. How do weekday and weekend ridership patterns differ?
3. How do changes in COVID-19 cases relate to changes in transit ridership?

## Dataset
## Data Sources

- **Source:** [MTA Daily Ridership Data](https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew)
- **Updated:** Daily
- **MTA Daily Ridership Data**
https://data.ny.gov/Transportation/MTA-Daily-Ridership-Data-Beginning-2020/vxuj-8kew

## Setup

### 1. Clone and install

```bash
git clone https://github.com/advanced-computing/bouncing-penguin.git
cd bouncing-penguin
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -r requirements.txt
```

### 2. Configure BigQuery credentials

The app reads MTA data from BigQuery. You need a service account key to connect.

1. Get the service account key JSON for `streamlit@sipa-adv-c-bouncing-penguin.iam.gserviceaccount.com` from your team or GCP Console (IAM & Admin > Service Accounts > Keys).
2. Create the secrets file:

```bash
mkdir -p .streamlit
```

3. Create `.streamlit/secrets.toml` with the following structure, filling in values from the JSON key:
- **NYC COVID-19 Daily Cases**
https://data.cityofnewyork.us/Health/Coronavirus-Data/rc75-m7u3

```toml
[gcp_service_account]
type = "service_account"
project_id = "sipa-adv-c-bouncing-penguin"
private_key_id = "<from JSON>"
client_email = "streamlit@sipa-adv-c-bouncing-penguin.iam.gserviceaccount.com"
client_id = "<from JSON>"
auth_uri = "https://accounts.google.com/o/oauth2/auth"
token_uri = "https://oauth2.googleapis.com/token"
private_key = "<from JSON>"
```
## Repository Structure

### 3. Load data into BigQuery (optional)
- `streamlit_app.py` - homepage and project introduction
- `pages/1_MTA_Ridership.py` - main MTA ridership analysis
- `pages/2_Second_Dataset.py` - NYC COVID-19 context page
- `utils.py` - helper functions for cleaning and plotting
- `validation.py` - Pandera schema validation
- `tests/` - unit tests for utility and validation code
- `load_data_to_bq.py` - script for loading data into BigQuery

If the BigQuery table doesn't exist yet, run the data loading script:

```bash
python load_data_to_bq.py
```

This fetches MTA ridership data from the NYC Open Data API and uploads it to BigQuery. You will be prompted to authenticate with your Google account.

### 4. Run the app

```bash
streamlit run streamlit_app.py
```

The app will open at `http://localhost:8501`.

## Live App
## Setup

[bouncing-penguin-forever.streamlit.app](https://bouncing-penguin-forever.streamlit.app)
1. Clone this repository: `git clone https://github.com/advanced-computing/bouncing-penguin.git`
2. Create virtual environment: `python -m venv .venv`
3. Activate virtual environment:
- Mac/Linux: `source .venv/bin/activate`
- Windows: `.venv\Scripts\activate`
4. Install dependencies: `pip install -r requirements.txt`

## Usage

- **Dashboard tab**: Interactive visualizations of MTA ridership recovery trends, weekday vs weekend comparisons, holiday impacts, and year-over-year analysis.
- **Proposal tab**: Project background, research questions, methodology, and preliminary findings.
- **MTA Ridership page**: Simplified ridership charts.
- **Second Dataset page**: NYC COVID-19 case data for context.
Open `mta_ridership_project.ipynb` in Jupyter Notebook or VS Code to run the analysis.
Loading