Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion dataloading.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,10 @@
# Electricity Price Data
* For each month from 2017/01 to the latest available month, fetch the zipfile data from the source site via API, extract all csv files inside the zipfile, and concatenate them.
* Create the table in Big Query from the first month(2017/01) only.
* Append data to the table from 2017/02 up to the latest available month.
* Append data to the table from 2017/02 up to the latest available month.

# Henry Hub Natural Gas Data
* Use a truncate-and-load (full refresh) strategy.
* For each run, fetch the full daily Henry Hub series from the EIA API, starting from 1993/12/24 up to the latest available date.
* Keep the relevant fields, clean the data, and replace the BigQuery table each time using `if_exists="replace"`.
* This strategy is appropriate because the Henry Hub dataset is relatively small and stable, so a full refresh is simpler and more reliable than incremental loading. It avoids the extra complexity of tracking the last loaded date, handling reruns, and preventing duplicate rows. As a result, the BigQuery table always contains one clean and up-to-date version of the source data.
Loading