Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.PHONY: all format test install download upload docker documentation data publish-local-area clean build paper clean-paper presentations database database-refresh promote-database promote-dataset
.PHONY: all format test install download upload docker documentation data calibrate publish-local-area clean build paper clean-paper presentations database database-refresh promote-database promote-dataset

HF_CLONE_DIR ?= $(HOME)/huggingface/policyengine-us-data

Expand Down Expand Up @@ -97,6 +97,10 @@ data: download
python policyengine_us_data/datasets/cps/small_enhanced_cps.py
python policyengine_us_data/datasets/cps/local_area_calibration/create_stratified_cps.py

calibrate: data
python -m policyengine_us_data.calibration.unified_calibration \
--puf-dataset policyengine_us_data/storage/puf_2024.h5

publish-local-area:
python policyengine_us_data/datasets/cps/local_area_calibration/publish_local_area.py

Expand Down
12 changes: 12 additions & 0 deletions changelog_entry.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
- bump: minor
changes:
added:
- PUF clone + QRF imputation module (puf_impute.py) with state_fips predictor and stratified subsample preserving top 0.5% by AGI
- ACS re-imputation module (source_impute.py) with state predictor; SIPP/SCF imputation without state (surveys lack state identifiers)
- PUF and source impute integration into unified calibration pipeline (--puf-dataset, --skip-puf, --skip-source-impute flags)
- 21 new tests for puf_impute and source_impute modules
- DC_STATEHOOD=1 environment variable set in storage/__init__.py to ensure DC is included in state-based processing
changed:
- Refactored extended_cps.py to delegate to puf_impute.puf_clone_dataset() (443 -> 75 lines)
- PUF QRF training uses stratified subsample (20K target) instead of random subsample(10_000), force-including high-income tail
- unified_calibration.py pipeline now supports optional source imputation and PUF cloning steps
Loading