Add Borusyak-Jaravel-Spiess (2024) Imputation DiD estimator by igerber · Pull Request #141 · igerber/diff-diff

igerber · 2026-02-08T23:08:09Z

Summary

Implement ImputationDiD estimator: the efficient imputation approach for staggered DiD from Borusyak, Jaravel & Spiess (2024)
Three-step estimation: OLS on untreated observations → impute counterfactuals → aggregate treatment effects
Conservative clustered variance (Theorem 3, Equation 7) with configurable auxiliary model partition
Event study and group aggregation, pre-trend test (Equation 9), Proposition 5 handling for no never-treated units
Optional multiplier bootstrap, covariate support via within-transformation
54 tests, API docs, tutorial notebook, R benchmark scripts, full documentation updates

Methodology references (required if estimator / math changes)

Method name(s): Imputation DiD (Borusyak-Jaravel-Spiess)
Paper / source link(s): Borusyak, K., Jaravel, X., & Spiess, J. (2024). Revisiting Event-Study Designs: Robust and Efficient Estimation. Review of Economic Studies, 91(6), 3253-3285.
Any intentional deviations from the source (and why):
- Bootstrap inference: Not proposed in the paper (which uses only analytical inference via Theorem 3). Added optional multiplier bootstrap for consistency with other staggered estimators (CallawaySantAnna, SunAbraham). Documented in REGISTRY.md.
- v_it weight derivation: The paper's Supplementary Proposition A3 (not in extraction range) provides the explicit formula. The FE-only closed form was reconstructed from Theorem 3's general form. Documented in REGISTRY.md.

Validation

Tests added/updated: tests/test_imputation.py (54 tests covering basic fit, aggregation modes, variance estimation, pre-trend test, bootstrap, cross-validation vs CS/SA, edge cases)
Full test suite: 1150 passed, 30 skipped, 0 failures
Backtest / simulation / notebook evidence: docs/tutorials/11_imputation_did.ipynb, R benchmark scripts at benchmarks/R/benchmark_didimputation.R

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

Implement ImputationDiD, the efficient imputation estimator for staggered DiD from Borusyak, Jaravel & Spiess (2024) "Revisiting Event-Study Designs: Robust and Efficient Estimation", Review of Economic Studies. Core features: - Three-step estimation: OLS on untreated → impute counterfactuals → aggregate - Conservative clustered variance (Theorem 3) with configurable auxiliary model partition (cohort_horizon, cohort, horizon) - Event study and group aggregation with balance_e and horizon_max options - Pre-trend test (Equation 9) independent of treatment effect estimation - Proposition 5: NaN for unidentified long-run effects without never-treated - Optional multiplier bootstrap for consistency with other staggered estimators - Covariate support via within-transformation Includes 54 tests, API docs, tutorial notebook, R benchmark scripts, methodology registry entry, and README/ROADMAP/CLAUDE.md updates. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-08T23:14:46Z

Overall Assessment: ⛔ Blocker

Executive Summary

ImputationDiD’s conservative variance deviates from the Methodology Registry: τ̃_g is computed as an unweighted mean instead of v_it‑weighted, which can misstate SEs for event‑study/group targets. diff_diff/imputation.py:L1239-L1294
Step 1 is not OLS for unbalanced panels (uses simple demeaning/means), conflicting with the paper/Registry requirement and the “unbalanced panels supported” claim. diff_diff/imputation.py:L922-L993
Rank‑condition coverage for untreated periods isn’t enforced; missing FE are handled inconsistently between imputation and variance, risking silent NaNs and wrong SEs. diff_diff/imputation.py:L625-L689, diff_diff/imputation.py:L1025-L1039, diff_diff/imputation.py:L1260-L1267
Bootstrap inference ignores the user’s cluster parameter and always resamples by unit. diff_diff/imputation.py:L1773-L1812
Event‑study plotting may misplace the reference period for ImputationDiD with anticipation>0. diff_diff/visualization.py:L405-L431

Methodology

Severity: P0 | Impact: ImputationDiD variance uses unweighted τ̃_g despite Registry specifying v_it‑weighted averages; this breaks Theorem 3’s auxiliary residuals and can bias SEs for event‑study/group aggregations. | Fix: In _compute_auxiliary_residuals_treated, compute τ̃_g as sum(v_it * τ̂_it) / sum(v_it) within each partition group (guard zero weights), and remove the “simple average” comment. diff_diff/imputation.py:L1239-L1294
Severity: P0 | Impact: Step 1 is not OLS on untreated observations for unbalanced panels (uses unit/time means and double‑demeaning), so imputation can be biased even though the estimator claims unbalanced support. | Fix: Replace mean‑based FE recovery with an actual OLS/absorbed‑FE solution on Ω0 (e.g., solve with unit/time dummies or a proper two‑way FE projection for unbalanced panels) for both covariate and no‑covariate paths. diff_diff/imputation.py:L922-L993
Severity: P1 | Impact: Rank‑condition checks are incomplete (no check for post‑treatment periods lacking untreated units); missing FE yield NaNs in τ̂ but variance uses fillna(0), producing inconsistent inference. | Fix: Add explicit unit/time coverage checks and honor rank_deficient_action (warn/error/silent); align missing‑FE handling between _impute_treatment_effects and _compute_auxiliary_residuals_treated (either drop affected obs and renormalize weights or set all inference to NaN). diff_diff/imputation.py:L625-L689, diff_diff/imputation.py:L1025-L1039, diff_diff/imputation.py:L1260-L1267
Severity: P1 | Impact: Bootstrap inference ignores cluster when it differs from unit, so bootstrap SEs/p‑values are wrong for clustered inference. | Fix: Build bootstrap weights at cluster_var granularity (not unit) and map to observations accordingly. diff_diff/imputation.py:L1773-L1812

Code Quality

Severity: P2 | Impact: plot_event_study reference‑period inference only checks n_groups==0; ImputationDiD uses n_obs==0, so reference is wrong when anticipation>0. | Fix: Treat n_obs==0 as a reference marker for ImputationDiD results in _extract_plot_data. diff_diff/visualization.py:L405-L431

Performance

Severity: P3 | Impact: _compute_v_untreated_with_covariates builds dense dummy matrices, which is O(N·(U+T)) memory/time and may not scale to large panels. | Fix: Use sparse matrices or a projection/within‑transform approach that avoids explicit dummies. diff_diff/imputation.py:L1181-L1236

Maintainability

No material issues found in the diff.

Tech Debt

No material issues found in the diff.

Security

No material issues found in the diff.

Documentation/Tests

Severity: P2 | Impact: Tests don’t cover rank‑condition failures for missing untreated time periods, cluster!=unit bootstrap behavior, or plotting reference detection with anticipation>0; unbalanced‑panel coverage only asserts the code runs, not correctness. | Fix: Add targeted tests for those scenarios (e.g., a panel with a treated period lacking untreated obs; bootstrap with cluster set to a higher level; plot reference inference for anticipation=1). tests/test_imputation.py:L444-L461, tests/test_imputation.py:L618-L710

…ring - Fix 1 (P0): Replace one-pass demeaning with iterative Gauss-Seidel for exact FE on unbalanced panels (add _iterative_fe helper) - Fix 2 (P0): Use v_it-weighted tau_tilde_g in auxiliary residuals per Eq 8 - Fix 3 (P1): Add rank-condition checks after Step 1, fix fillna(0.0)->NaN consistency in auxiliary residuals, handle NaN in v*eps product - Fix 4 (P1): Bootstrap generates weights per cluster_var (not always unit), add cluster column validation - Fix 5 (P2): plot_event_study detects reference via n_obs==0 (ImputationDiD) - Fix 6 (P3): Use scipy.sparse for FE dummy matrices in covariate v_it - Fix 7: Add 6 new tests, update REGISTRY.md edge cases and docstrings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-09T00:55:29Z

/ai-review

github-actions · 2026-02-09T01:00:50Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3e93375db98e044aad07c7f1a4ddb6aaef130792

Overall assessment: ⚠️ Needs changes

Executive summary:

ImputationDiD’s core 3‑step estimator and Thm 3 variance structure appear aligned with the Methodology Registry, but there are three methodology/inference mismatches that need fixes.
The pre‑trend test uses one‑pass demeaning, which is not exact for unbalanced panels (registry requires iterative FE).
Overall ATT SEs are computed with weights over all treated observations even when tau_hat is NaN for some treated obs, so the variance targets a different estimand than the reported ATT.
balance_e is applied to event‑study aggregation but ignored in bootstrap event‑study inference, so bootstrap SEs/p‑values are computed on a different sample.

Methodology

P1 — Pre‑trend test FE demeaning is not exact for unbalanced panels. Impact: F‑stat/p‑value can be biased when the panel is unbalanced, conflicting with the registry’s requirement to use iterative FE (Gauss‑Seidel) for exact two‑way FE. Concrete fix: reuse the iterative demeaning logic from _fit_untreated_model (or run solve_ols with explicit unit/time dummies) for both y and X before the Wald test. (diff_diff/imputation.py:L1786-L1822)
P1 — Overall ATT variance uses uniform weights over all treated obs while the ATT averages only finite tau_hat. Impact: when rank conditions fail (missing FE) or Prop. 5 forces NaNs, the SE targets a different estimand and can be understated. Concrete fix: zero out weights for non‑finite tau_hat and renormalize to sum to 1, or return NaN for overall ATT/SE when any treated obs is unimputable. (diff_diff/imputation.py:L748-L773)
P1 — balance_e affects event‑study aggregation but not bootstrap inference. Impact: bootstrap SEs/p‑values are computed on all treated obs at horizon h, not the balanced cohort subset, so inference targets a different estimand. Concrete fix: persist the balanced cohort mask (or recompute it) inside _run_bootstrap and apply it to h_mask when forming bootstrap event‑study effects; add a regression test. (diff_diff/imputation.py:L1472-L1499, diff_diff/imputation.py:L1942-L1954)

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

P2 — Missing tests for key parameter interactions and edge‑case inference. Impact: regressions in bootstrap/event‑study inference or rank‑deficiency handling may slip through. Concrete fix: add tests in tests/test_imputation.py to cover balance_e + n_bootstrap > 0, and to assert overall ATT/SE behavior when some tau_hat are NaN due to rank condition violations.

…tstrap balance_e - Extract _iterative_demean as static method; fix _pretrend_test to use iterative demeaning (exact for unbalanced panels) instead of one-pass - Overall ATT SE: zero-weight NaN tau_hat observations so variance matches the finite-only ATT estimand; guard n_valid==0 returning NaN SE - Extract _compute_balanced_cohort_mask helper; apply in bootstrap event-study loop so bootstrap SEs respect balance_e filtering - Add 4 new tests covering all three fixes - Update REGISTRY.md edge cases documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-09T01:58:49Z

/ai-review

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions · 2026-02-09T02:08:48Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7495f0305a5fe5e0f46a4e5468f0d956170622c9

Overall assessment: ⚠️ Needs changes

Executive summary

ImputationDiD implementation tracks the Registry (Step 1/2/3, Theorem 3 variance, Prop 5 handling, NaN propagation) and is well covered by tests.
P1: Missing absorbing‑treatment/first_treat consistency validation, which is a required assumption in the Methodology Registry and can silently invalidate estimates.
No other material methodology, edge‑case, or security issues found in the diff.

Methodology

P1 — Missing absorbing‑treatment validation. Impact: The Registry requires absorbing treatment (once treated, always treated). fit uses row‑level first_treat directly to build D_it without checking it’s constant per unit; if first_treat varies within a unit (or encodes reversals), the estimator and SEs are invalid with no warning. Concrete fix: add a unit‑level check after type conversion to ensure first_treat is constant within unit (and optionally warn/error on violations), similar to the absorbing‑treatment warning in MultiPeriodDiD; add a regression test for non‑constant first_treat. diff_diff/imputation.py:622-676

Code Quality

No issues found.

Performance

No issues found.

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

No issues found; tests are extensive.

…putationDiD Emit UserWarning when first_treat varies within a unit, which violates the absorbing treatment assumption. The estimator proceeds but warns users that results may be unreliable. Adds test and REGISTRY.md documentation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-09T11:55:58Z

/ai-review

github-actions · 2026-02-09T12:02:59Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 3f361b64d54401ffa7b49609ce89ffcb6413ba82

Overall assessment: ⚠️ Needs changes

Executive summary

P1: Multiplier bootstrap uses uncentered tau_hat rather than influence‑function/centered contributions, so bootstrap SEs/p‑values are likely incorrect for overall, event‑study, and group effects.
P2: treatment_effects["weight"] doesn’t renormalize when some tau_hat are NaN, so downstream re‑aggregation can disagree with the estimator’s ATT.
P2: Covariate variance path forms a dense (A0'A0) matrix that can blow up memory for large panels; consider sparse/iterative solves or guardrails.
P2: If covariates are rank‑deficient and dropped in Step 1, the variance path still includes them, which can make SEs inconsistent with the fitted model.

Methodology

Severity: P1 | Impact: Bootstrap inference is not aligned with the estimator’s influence function; it weights raw tau_hat with Rademacher draws (uncentered), which ignores first‑stage estimation error and produces non‑zero variance even in constant‑effect/no‑noise settings. This can materially misstate uncertainty for all bootstrap outputs. | Fix: Implement multiplier bootstrap on the Theorem‑3 influence function: compute psi_it = v_it * epsilon_tilde_it, sum within clusters, multiply cluster sums by Rademacher weights, and use the resulting draws for SE/CI/p‑values; or re‑fit via cluster bootstrap. diff_diff/imputation.py:L2015-L2069
Severity: P2 | Impact: With rank‑deficient covariates, Step 1 drops/zeroes some coefficients, but the covariate variance path still uses the full covariate design, so v_untreated can be inconsistent with the actual fitted model, distorting SEs. | Fix: Track the kept covariate columns from solve_ols (or a boolean mask) and build A_0/A_1 using only those; alternatively, honor rank_deficient_action here and error/warn consistently. diff_diff/imputation.py:L1420-L1441

Code Quality

Severity: P2 | Impact: treatment_effects["weight"] is always 1/n_treated, even when some treated observations have tau_hat = NaN and the estimator renormalizes over finite values; users re‑aggregating from this table will get a different ATT. | Fix: Set weights to 0 for NaN tau_hat and renormalize to sum to 1 over finite observations, or explicitly document that this column is nominal. diff_diff/imputation.py:L867-L871

Performance

Severity: P2 | Impact: A0tA0 = (A_0.T @ A_0).toarray() scales as O((U+T+K)^2) and becomes prohibitive for large panels, risking OOM. | Fix: Solve the normal equations with sparse solvers (e.g., spsolve/iterative CG on A_0.T @ A_0 without densifying) or add guardrails/warnings when p is large. diff_diff/imputation.py:L1434-L1441

Maintainability

No issues found.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P3 | Impact: New tests cover bootstrap reproducibility but not correctness; the mis‑centering/influence‑function issue would not be detected. | Fix: Add a test where the analytical SE is known (or residuals are zero) and assert bootstrap SE ≈ 0 or matches the influence‑function variance. tests/test_imputation.py

…riance, weight/covariate fixes - Fix bootstrap to use Theorem 3 influence function (psi_i = sum_t v_it * epsilon_tilde_it) instead of perturbing raw tau_hat, correctly accounting for first-stage estimation error - Add sparse solver (spsolve) for variance normal equations to avoid O(p^2) memory blow-up, with dense lstsq fallback - Exclude rank-deficient covariates from variance design matrices A_0/A_1 via kept_cov_mask - Renormalize treatment_effects weights: 0 for NaN tau_hat, 1/n_valid for finite - Add 6 new tests: zero-noise bootstrap SE, sparse solver, dense fallback, weight consistency, rank-deficient covariates, bootstrap precomputation failure warning - Update REGISTRY.md edge cases documentation Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-09T13:05:50Z

/ai-review

github-actions · 2026-02-09T13:13:08Z

🔁 AI review rerun (requested by @igerber)

Head SHA: d62934225e7bc64f9747b94ae4ab974d40320721

Overall assessment: ⚠️ Needs changes

Executive summary

P1: balance_e does not enforce pre‑treatment balance as specified in the Methodology Registry/docs; it only checks horizons present in treated observations.
The core Imputation DiD steps (Omega₀ OLS → impute → Theorem 3 variance) otherwise look aligned with the registry and docstrings.
Maintainability: aggregation/bootstrap weight logic is duplicated and could drift.
Tests are extensive, but there is no assertion that balance_e actually enforces pre‑treatment balance.

Methodology

P1 | Impact: balance_e is documented to restrict cohorts to those observed at all relative times in [-balance_e, max_h], but the implementation only considers horizons among treated observations (typically h >= 0). This allows cohorts missing required pre‑treatment periods to pass the “balanced” filter, affecting event‑study estimates and SEs (and their bootstrap counterparts). | Concrete fix: Compute cohort horizons using all observations for treated units (including pre‑treatment periods), and build required_range explicitly over the full integer range [-balance_e, max_h] (or include negative horizons explicitly) before masking treated observations.
Locations: diff_diff/imputation.py:L1105-L1138, diff_diff/imputation.py:L1664-L1678.

Code Quality

No issues found.

Performance

No issues found.

Maintainability

P3 | Impact: Weight construction for event‑study and group aggregations is duplicated between _aggregate_event_study/_aggregate_group and _precompute_bootstrap_psi, which risks analytical and bootstrap paths diverging if logic changes later. | Concrete fix: Extract a shared helper to compute weights/masks per horizon or cohort and reuse it in both aggregation and bootstrap precompute.
Locations: diff_diff/imputation.py:L1640-L1802, diff_diff/imputation.py:L2095-L2158.

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

P2 | Impact: Current balance_e test only checks horizon counts, so it would not catch missing pre‑treatment balance (the P1 issue). | Concrete fix: Add a test that removes a pre‑treatment period for one cohort and asserts balance_e drops that cohort (and that bootstrap uses the same mask).
Location: tests/test_imputation.py:L275-L300.

…riods _compute_balanced_cohort_mask had two bugs preventing pre-treatment balance checking: (1) required_range filtered from post-treatment-only all_horizons, missing negative values; (2) cohort horizons were built from Omega_1 (post-treatment), so negative rel_times were never seen. Fix: add _build_cohort_rel_times() to scan the full panel, and use range(-balance_e, max_h+1) for the required set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

igerber · 2026-02-09T14:07:07Z

/ai-review

github-actions · 2026-02-09T14:15:00Z

🔁 AI review rerun (requested by @igerber)

Head SHA: e3a69d394994145f4e637770fd65bf33a089ef26

Overall Assessment

⚠️ Needs changes

Executive Summary

P1: Non-constant first_treat handling conflicts with the Methodology Registry and can violate the absorbing-treatment assumption; needs enforcement or doc correction.
P3: Bootstrap weight types are documented beyond what’s implemented (rademacher-only); clarify or add parameter.
P3: Weight construction is duplicated between aggregation and bootstrap paths, risking drift.
Tests not run in this review.

Methodology

Severity: P1 | Impact: The estimator only warns on non-constant first_treat but still uses per-row values to build _treated and _rel_time, which can create non‑absorbing treatment paths and invalidate identification; this contradicts the Registry’s statement that it proceeds using the unit-level first observed value. | Concrete fix: After the warning, coerce df[first_treat] to a per‑unit value (e.g., transform("first") or validated earliest positive) before computing _treated/_rel_time, or raise on inconsistency; then align the Registry text accordingly. diff_diff/imputation.py:L632-L671, docs/methodology/REGISTRY.md:L520-L531

Code Quality

No issues found in changed code paths.

Performance

Severity: P3 | Impact: Event‑study SEs recompute conservative variance per horizon, which rebuilds and solves the variance system each time; this can become expensive with many horizons/covariates. | Concrete fix: Cache A_0/A_1 and the A_0'A_0 factorization, or add a batch path for multiple horizon weights. diff_diff/imputation.py:L1772-L1800

Maintainability

Severity: P3 | Impact: Weight construction is duplicated between aggregation and bootstrap precompute, risking inconsistencies if one side changes. | Concrete fix: Extract shared weight builder helpers for overall/horizon/group and reuse in both call sites. diff_diff/imputation.py:L2122-L2164

Tech Debt

No issues found.

Security

No issues found.

Documentation/Tests

Severity: P3 | Impact: Bootstrap results docstring advertises "mammen"/"webb" weights but implementation hardcodes rademacher, which can mislead users. | Concrete fix: Add a weight_type parameter wired to _generate_bootstrap_weights_batch, or update docstrings/docs to state rademacher-only. diff_diff/imputation.py:L46-L50, diff_diff/imputation.py:L2288-L2291
Severity: P3 | Impact: Test status unknown for this review. | Concrete fix: Run pytest tests/test_imputation.py (and ideally the full suite) before merge.

Merge main: resolve conflict in benchmarks/run_benchmarks.py

41c03ab

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Borusyak-Jaravel-Spiess (2024) Imputation DiD estimator#141

Add Borusyak-Jaravel-Spiess (2024) Imputation DiD estimator#141
igerber wants to merge 7 commits intomainfrom
imputation-estimator

igerber commented Feb 8, 2026

Uh oh!

github-actions bot commented Feb 8, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented Feb 8, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions bot commented Feb 8, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Overall assessment: ⚠️ Needs changes

Executive summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

igerber commented Feb 9, 2026

Uh oh!

github-actions bot commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant