Lab 10 Writeup

BigQuery Data Loading

This project now uses BigQuery for every dataset shown in the Streamlit app.

Dataset 1: MTA Daily Ridership

Source: https://data.ny.gov/resource/vxuj-8kew
BigQuery table: sipa-adv-c-bouncing-penguin.mta_data.daily_ridership
Loading type: batch full refresh
Why: the dataset is small, updated on a daily cadence, and easy to keep consistent by reloading the full table instead of managing row-by-row updates.

Dataset 2: NYC COVID-19 Daily Cases

Source: https://data.cityofnewyork.us/resource/rc75-m7u3
BigQuery table: sipa-adv-c-bouncing-penguin.mta_data.nyc_covid_cases
Loading type: batch full refresh
Why: this table is also small enough for a daily refresh, and full replacement keeps the historical series in sync without extra incremental-loading logic.

Loader Script

The repository includes load_data_to_bq.py, which:

Authenticates with Google BigQuery
Creates the mta_data dataset if it does not already exist
Pulls source data from both Open Data APIs
Cleans date and numeric fields before upload
Replaces the target BigQuery tables
Verifies each upload with row counts and date ranges

Run it with:

python load_data_to_bq.py --dataset all

You can also load a single table:

python load_data_to_bq.py --dataset mta
python load_data_to_bq.py --dataset covid

App Changes for BigQuery

The Streamlit app no longer reads API responses directly inside page files.

utils.py now provides shared BigQuery helpers for both datasets
streamlit_app.py reads MTA data from BigQuery
pages/1_MTA_Ridership.py reads MTA data from BigQuery
pages/2_Second_Dataset.py reads COVID data from BigQuery

This keeps all pages aligned with the lab requirement that every dataset come from BigQuery.

Performance Work

To improve load time and make performance visible:

Each page uses a custom display_load_time() context manager and shows total load time in the UI
BigQuery results are cached with Streamlit caching
Queries select only the columns used by the app instead of SELECT *
Repeated client setup is cached with a shared BigQuery client helper
Basic data cleaning is centralized in utils.py so pages do less work on every rerun
The homepage dashboard is split into lighter sections so each view renders only the charts needed for that section
Default chart selections were reduced to fewer transit modes so the initial render sends fewer Plotly traces

These changes improve both initial and subsequent page loads, while keeping the code easier to maintain.

Local Verification Steps

Run python load_data_to_bq.py --dataset all
Run streamlit run streamlit_app.py
Open each page and confirm the caption shows the page load time
Record the screen while loading the main page and both sub-pages

Assumption

I interpreted "repeat the middle steps from Part 5" as: load the datasets into BigQuery, point the app at BigQuery tables, and document the table-level setup in the repository.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lab 10 Writeup

BigQuery Data Loading

Dataset 1: MTA Daily Ridership

Dataset 2: NYC COVID-19 Daily Cases

Loader Script

App Changes for BigQuery

Performance Work

Local Verification Steps

Assumption

FilesExpand file tree

LAB_10_WRITEUP.md

Latest commit

History

LAB_10_WRITEUP.md

File metadata and controls

Lab 10 Writeup

BigQuery Data Loading

Dataset 1: MTA Daily Ridership

Dataset 2: NYC COVID-19 Daily Cases

Loader Script

App Changes for BigQuery

Performance Work

Local Verification Steps

Assumption