Background:
Quick and easily reproducible custom filtering that assigns a single hs_version to each year. Need to reduce data transfer size from KNB to local storage to duckdb. The next release of ARTIS data (v1.1.0) the consumption tables are substantially larger. In order for our KNB to duckdb solution to work for the internal team and end users, we need to be efficient with file volumes.
For analyses purposes, we often filter ARTIS data to specific hs_version and year pairs to create a time series dataset. This will be a more direct product for users and save download and duckdb build time.
Task:
# Filter to single hs_version / year pairings
filter(
# Use HS96 from 1996-2003 (inclusive)
((hs_version == "HS96") & (year <= 2003)) |
# Use HS02 from 2004-2009 (inclusive)
((hs_version == "HS02") & (year >= 2004 & year <= 2009)) |
# Use HS07 from 2010-2012 (inclusive)
((hs_version == "HS07") & (year >= 2010 & year <= 2012)) |
# Use HS12 from 2013-2020 (inclusive)
((hs_version == "HS12") & (year >= 2013 & year <= 2020))
)
Background:
Quick and easily reproducible custom filtering that assigns a single
hs_versionto eachyear. Need to reduce data transfer size from KNB to local storage to duckdb. The next release of ARTIS data (v1.1.0) the consumption tables are substantially larger. In order for our KNB to duckdb solution to work for the internal team and end users, we need to be efficient with file volumes.For analyses purposes, we often filter ARTIS data to specific
hs_versionandyearpairs to create a time series dataset. This will be a more direct product for users and save download and duckdb build time.Task: