Skip to content

Proposal: new repo for general survey microdata calculator #698

@MaxGhenis

Description

@MaxGhenis

Context

Marina Gindelsky at BEA uses PolicyEngine to calculate taxes and transfers on CPS ASEC microdata that she's scaled to NIPA totals. Currently she goes through policyengine-taxsim's TAXSIM format, but this is limiting — TAXSIM's input format doesn't cover transfer variables (SNAP, Medicaid, etc.) and requires lossy translation from CPS's native structure.

More broadly, many researchers and agencies have their own microdata (CPS, ACS, SCF, SIPP, custom surveys) and want to run PolicyEngine calculations on it. The current options are:

  1. policyengine-taxsim — limited to TAXSIM's input/output format
  2. policyengine-us Simulation API — powerful but requires manually mapping every variable to PE's entity structure
  3. policyengine-us-data — builds our canonical dataset, not designed for user-supplied data

Proposal: policyengine-survey-calculator (or similar name)

A new repo/package that:

  1. Accepts common survey microdata formats with prebuilt connectors:

    • CPS ASEC (Marina's immediate need)
    • ACS
    • SCF
    • SIPP
    • Flat CSV with a documented schema
  2. Maps survey variables to PolicyEngine's input variables — each connector knows how to translate (e.g., CPS PEMLR -> PE employment_income, CPS household structure -> PE tax units/SPM units)

  3. Runs PolicyEngine calculations and returns results merged back onto the original microdata

  4. Exports results in the original survey format or flat files

Separation of concerns

Repo Responsibility
policyengine-us Tax/benefit rules only
policyengine-us-data Building the best unified microdata file (Enhanced CPS)
policyengine-taxsim TAXSIM format emulation specifically
New repo "Bring your own survey data, get PE calculations back"

The CPS connector code could be shared with policyengine-us-data (e.g., as a dependency or shared utility), since both need to parse CPS ASEC structure into PE entities.

Open questions

  • Repo name: policyengine-survey-calculator? policyengine-surveys? policyengine-microdata?
  • Should policyengine-taxsim eventually become a thin wrapper around this + a TAXSIM format adapter?
  • How much connector logic can be shared with policyengine-us-data's CPS processing?
  • Should this support UK surveys too (FRS, SPI) or stay US-focused initially?

Immediate motivation

Marina at BEA needs transfer variable calculations (SNAP, Medicaid, SSI, etc.) on her CPS-based microdata. The TAXSIM format can't express these inputs/outputs. A CPS connector that accepts her data directly would solve this cleanly.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions