Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ database:
python policyengine_us_data/db/etl_snap.py
python policyengine_us_data/db/etl_state_income_tax.py
python policyengine_us_data/db/etl_irs_soi.py
python policyengine_us_data/db/etl_pregnancy.py
python policyengine_us_data/db/validate_database.py

database-refresh:
Expand Down
1 change: 1 addition & 0 deletions changelog.d/563.added
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Impute pregnancy in CPS microdata using CDC VSRR birth counts and Census ACS female population, with calibration targets per state.
20 changes: 20 additions & 0 deletions policyengine_us_data/datasets/cps/cps.py
Original file line number Diff line number Diff line change
Expand Up @@ -294,6 +294,26 @@ def add_takeup(self):
imputed_risk = rng.random(n_persons) < wic_risk_rate_by_person
data["is_wic_at_nutritional_risk"] = receives_wic | imputed_risk

# Pregnancy: stochastically assign is_pregnant to women 15-44
# using CDC/Census-derived state-level pregnancy rates.
# CPS does not ask about pregnancy; calibration will fine-tune.
from policyengine_us_data.db.etl_pregnancy import (
get_state_pregnancy_rates,
)

pregnancy_rates = get_state_pregnancy_rates()
national_rate = 0.041 # fallback
pregnancy_rate_by_person = np.array(
[pregnancy_rates.get(s, national_rate) for s in person_states]
)
ages = data["age"]
is_female = data["is_female"]
is_eligible = is_female & (ages >= 15) & (ages <= 44)
rng = seeded_rng("is_pregnant")
data["is_pregnant"] = is_eligible & (
rng.random(n_persons) < pregnancy_rate_by_person
)

# Voluntary tax filing: some people file even when not required and not
# seeking a refund. EITC take-up already captures refund-seeking behavior
# (if you take up EITC, you file). This variable captures people who file
Expand Down
4 changes: 3 additions & 1 deletion policyengine_us_data/db/DATABASE_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,8 @@ make promote-database # Copy DB + raw inputs to HuggingFace clone
| 6 | `etl_snap.py` | USDA FNS + Census ACS | SNAP participation (admin state-level, survey district-level) |
| 7 | `etl_state_income_tax.py` | No | State income tax collections (Census STC FY2023, hardcoded) |
| 8 | `etl_irs_soi.py` | IRS | Tax variables, EITC by child count, AGI brackets, conditional strata |
| 9 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |
| 9 | `etl_pregnancy.py` | CDC VSRR + Census ACS | Pregnancy prevalence by state (provisional birth counts) |
| 10 | `validate_database.py` | No | Checks all target variables exist in policyengine-us |

### Raw Input Caching

Expand Down Expand Up @@ -146,6 +147,7 @@ Strata are categorized by their **constraints**, not by a separate group ID fiel
| `adjusted_gross_income` | Income/AGI brackets |
| `snap` | SNAP recipient strata |
| `medicaid_enrolled` | Medicaid enrollment strata |
| `is_pregnant` | Pregnancy prevalence strata |
| `eitc_child_count` | EITC recipients by qualifying children |
| `state_income_tax` | State-level income tax collections |
| `aca_ptc` | ACA Premium Tax Credit strata |
Expand Down
1 change: 1 addition & 0 deletions policyengine_us_data/db/create_field_valid_values.py
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ def populate_field_valid_values(session: Session) -> None:
("source", "USDA FNS SNAP", "administrative"),
("source", "Census ACS S2201", "survey"),
("source", "Census STC", "administrative"),
("source", "CDC VSRR Natality", "administrative"),
("source", "PolicyEngine", "hardcoded"),
]

Expand Down
Loading