Add end-to-end test for calibration database build pipeline by MaxGhenis · Pull Request #556 · PolicyEngine/policyengine-us-data

MaxGhenis · 2026-02-26T00:22:07Z

Summary

Adds test_database_build.py that runs every ETL script in sequence (matching make database order) against a HuggingFace-hosted stratified CPS dataset
Validates the resulting SQLite database has correct tables, national targets, state income tax coverage (42+ states, CA > $100B), 435+ congressional district strata, valid policyengine-us variables, and 1000+ total targets
Catches API mismatches, broken imports, and data-loading errors before they reach production — the type of issues that caused Add state income tax revenue as calibration target #492 / Fix etl_state_income_tax.py API mismatches #500

Test plan

CI passes (the test itself takes ~2 minutes to run all 9 ETL scripts)
Verify test catches intentional breakage (e.g. rename a function in db_metadata.py)

🤖 Generated with Claude Code

Runs all ETL scripts (create_database_tables, create_initial_strata, etl_national_targets, etl_age, etl_medicaid, etl_snap, etl_state_income_tax, etl_irs_soi, validate_database) in sequence and validates the resulting SQLite database for: - Expected tables (strata, stratum_constraints, targets) - National targets include key variables (snap, social_security, ssi) - State income tax targets cover 42+ states with CA > $100B - Congressional district strata for 435+ districts - All target variables exist in policyengine-us - Total target count > 1000 This prevents API mismatches and import errors from going undetected when ETL scripts are modified. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

MaxGhenis force-pushed the add-database-build-test branch from c33c39f to 94bf74f Compare February 26, 2026 00:26

MaxGhenis force-pushed the add-database-build-test branch from 94bf74f to be5b15f Compare February 26, 2026 00:29

MaxGhenis merged commit b8e4e40 into main Feb 26, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add end-to-end test for calibration database build pipeline#556

Add end-to-end test for calibration database build pipeline#556
MaxGhenis merged 1 commit intomainfrom
add-database-build-test

MaxGhenis commented Feb 26, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

MaxGhenis commented Feb 26, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant