Skip to content

Add WID pre-tax national income and standardized inequality summary stats #7462

@MaxGhenis

Description

@MaxGhenis

Summary

Create variables for standard income concepts used by major inequality databases (WID, LIS, OECD), starting with WID's pre-tax national income. Also add standardized summary statistic computation (Gini, percentile shares, etc.) that matches the methodology of these external sources.

Motivation

PolicyEngine computes household_net_income and household_market_income, but these don't map directly to the income concepts used by the World Inequality Database (WID), OECD, or LIS. This makes it hard to:

  1. Validate against external benchmarks — e.g., OWID reports US Gini of 0.587 (2023) using WID pre-tax national income, but PE's net income Gini is 0.565 and market income Gini is 0.643. Neither matches.
  2. Compare cross-country — WID's methodology is consistent across countries, so implementing it in PE would enable apples-to-apples comparisons.
  3. Forecast for prediction markets — There are active prediction markets on OWID Gini values that PE microsim could inform, but only if we compute the same income concept.

WID pre-tax national income definition

From WID methodology:

  • Pre-tax national income = factor income (labor + capital) + pensions + unemployment insurance, before taxes and transfers
  • Unit: Equal-split among adults (20+) in the same household — i.e., total household pre-tax national income divided equally among adult members
  • Population: Adults aged 20+

Key differences from PE's current variables:

  • Includes pension/UI income (unlike pure market income)
  • Excludes means-tested transfers (unlike net income)
  • Equal-split among adults only (not per-capita, not equivalized)

Proposed implementation

New variables in policyengine-us

# Person-level
wid_pretax_national_income_person  # = (labor + capital + pensions + UI) / n_adults_in_household

Components:

  • employment_income + self_employment_income (labor)
  • capital_gains + dividend_income + interest_income + rental_income (capital)
  • social_security + pension_income (pensions)
  • unemployment_compensation (UI)
  • Divided by number of adults (20+) in the tax unit/household

Consider for policyengine-core or policyengine.py

If the income concept definition is common across countries (which it is for WID), the variable template or summary stat functions could live in a shared package:

  • wid_pretax_national_income as a cross-country variable pattern
  • Standardized gini(variable, population_filter, weighting) that matches WID's methodology
  • Percentile share computation matching WID conventions

Acceptance criteria

  • wid_pretax_national_income_person variable computes correctly
  • Equal-split among adults 20+ in the household
  • Gini coefficient on this variable is close to OWID's reported 0.587 for 2023
  • Document methodology and any remaining discrepancies

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions