igerber · igerber · Feb 9, 2026 · Feb 8, 2026 · Feb 9, 2026 · Feb 9, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -97,6 +97,16 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
   - Alternative to Callaway-Sant'Anna with different weighting scheme
   - Useful robustness check when both estimators agree
 
+- **`diff_diff/imputation.py`** - Borusyak-Jaravel-Spiess imputation DiD estimator:
+  - `ImputationDiD` - Borusyak et al. (2024) efficient imputation estimator for staggered DiD
+  - `ImputationDiDResults` - Results with overall ATT, event study, group effects, pre-trend test
+  - `ImputationBootstrapResults` - Multiplier bootstrap inference results
+  - `imputation_did()` - Convenience function
+  - Steps: (1) OLS on untreated obs for unit+time FE, (2) impute counterfactual Y(0), (3) aggregate
+  - Conservative variance (Theorem 3) with `aux_partition` parameter for SE tightness
+  - Pre-trend test (Equation 9) via `results.pretrend_test()`
+  - Proposition 5: NaN for unidentified long-run horizons without never-treated units
+
 - **`diff_diff/triple_diff.py`** - Triple Difference (DDD) estimator:
   - `TripleDifference` - Ortiz-Villavicencio & Sant'Anna (2025) estimator for DDD designs
   - `TripleDifferenceResults` - Results with ATT, SEs, cell means, diagnostics
@@ -255,6 +265,7 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
    Standalone estimators (each has own get_params/set_params):
    ├── CallawaySantAnna
    ├── SunAbraham
+   ├── ImputationDiD
    ├── TripleDifference
    ├── TROP
    ├── SyntheticDiD
@@ -364,6 +375,7 @@ Tests mirror the source modules:
 - `tests/test_estimators.py` - Tests for DifferenceInDifferences, TWFE, MultiPeriodDiD, SyntheticDiD
 - `tests/test_staggered.py` - Tests for CallawaySantAnna
 - `tests/test_sun_abraham.py` - Tests for SunAbraham interaction-weighted estimator
+- `tests/test_imputation.py` - Tests for ImputationDiD (Borusyak et al. 2024) estimator
 - `tests/test_triple_diff.py` - Tests for Triple Difference (DDD) estimator
 - `tests/test_trop.py` - Tests for Triply Robust Panel (TROP) estimator
 - `tests/test_bacon.py` - Tests for Goodman-Bacon decomposition

diff --git a/README.md b/README.md
@@ -70,7 +70,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
 - **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
 - **Panel data support**: Two-way fixed effects estimator for panel designs
 - **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
-- **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
+- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), and Borusyak-Jaravel-Spiess (2024) imputation estimators for heterogeneous treatment timing
 - **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
 - **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
 - **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
@@ -879,6 +879,54 @@ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
 # If results differ substantially, investigate heterogeneity
 ```
 
+### Borusyak-Jaravel-Spiess Imputation Estimator
+
+The Borusyak et al. (2024) imputation estimator is the **efficient** estimator for staggered DiD under parallel trends, producing ~50% shorter confidence intervals than Callaway-Sant'Anna and 2-3.5x shorter than Sun-Abraham under homogeneous treatment effects.
+
+```python
+from diff_diff import ImputationDiD, imputation_did
+
+# Basic usage
+est = ImputationDiD()
+results = est.fit(data, outcome='outcome', unit='unit',
+                  time='period', first_treat='first_treat')
+results.print_summary()
+
+# Event study
+results = est.fit(data, outcome='outcome', unit='unit',
+                  time='period', first_treat='first_treat',
+                  aggregate='event_study')
+
+# Pre-trend test (Equation 9)
+pt = results.pretrend_test(n_leads=3)
+print(f"F-stat: {pt['f_stat']:.3f}, p-value: {pt['p_value']:.4f}")
+
+# Convenience function
+results = imputation_did(data, 'outcome', 'unit', 'period', 'first_treat',
+                         aggregate='all')
+```
+
+```python
+ImputationDiD(
+    anticipation=0,         # Number of anticipation periods
+    alpha=0.05,             # Significance level
+    cluster=None,           # Cluster variable (defaults to unit)
+    n_bootstrap=0,          # Bootstrap iterations (0=analytical inference)
+    seed=None,              # Random seed
+    horizon_max=None,       # Max event-study horizon
+    aux_partition="cohort_horizon",  # Variance partition: "cohort_horizon", "cohort", "horizon"
+)
+```
+
+**When to use Imputation DiD vs Callaway-Sant'Anna:**
+
+| Aspect | Imputation DiD | Callaway-Sant'Anna |
+|--------|---------------|-------------------|
+| Efficiency | Most efficient under homogeneous effects | Less efficient but more robust to heterogeneity |
+| Control group | Always uses all untreated obs | Choice of never-treated or not-yet-treated |
+| Inference | Conservative variance (Theorem 3) | Multiplier bootstrap |
+| Pre-trends | Built-in F-test (Equation 9) | Separate testing |
+
 ### Triple Difference (DDD)
 
 Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
@@ -2000,6 +2048,60 @@ SunAbraham(
 | `print_summary(alpha)` | Print summary to stdout |
 | `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |
 
+### ImputationDiD
+
+```python
+ImputationDiD(
+    anticipation=0,                   # Periods of anticipation effects
+    alpha=0.05,                       # Significance level for CIs
+    cluster=None,                     # Column for cluster-robust SEs
+    n_bootstrap=0,                    # Bootstrap iterations (0 = analytical)
+    seed=None,                        # Random seed
+    rank_deficient_action='warn',     # 'warn', 'error', or 'silent'
+    horizon_max=None,                 # Max event-study horizon
+    aux_partition='cohort_horizon',   # Variance partition
+)
+```
+
+**fit() Parameters:**
+
+| Parameter | Type | Description |
+|-----------|------|-------------|
+| `data` | DataFrame | Panel data |
+| `outcome` | str | Outcome variable column name |
+| `unit` | str | Unit identifier column |
+| `time` | str | Time period column |
+| `first_treat` | str | First treatment period column (0 for never-treated) |
+| `covariates` | list | Covariate column names |
+| `aggregate` | str | Aggregation: None, "event_study", "group", "all" |
+| `balance_e` | int | Balance event study to this many pre-treatment periods |
+
+### ImputationDiDResults
+
+**Attributes:**
+
+| Attribute | Description |
+|-----------|-------------|
+| `overall_att` | Overall average treatment effect on the treated |
+| `overall_se` | Standard error (conservative, Theorem 3) |
+| `overall_t_stat` | T-statistic |
+| `overall_p_value` | P-value for H0: ATT = 0 |
+| `overall_conf_int` | Confidence interval |
+| `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) |
+| `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) |
+| `treatment_effects` | DataFrame of unit-level imputed treatment effects |
+| `n_treated_obs` | Number of treated observations |
+| `n_untreated_obs` | Number of untreated observations |
+
+**Methods:**
+
+| Method | Description |
+|--------|-------------|
+| `summary(alpha)` | Get formatted summary string |
+| `print_summary(alpha)` | Print summary to stdout |
+| `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') |
+| `pretrend_test(n_leads)` | Run pre-trend F-test (Equation 9) |
+
 ### TripleDifference
 
 ```python
@@ -2464,6 +2566,14 @@ The `HonestDiD` module implements sensitivity analysis methods for relaxing the
 
 ### Multi-Period and Staggered Adoption
 
+- **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. [https://doi.org/10.1093/restud/rdae007](https://doi.org/10.1093/restud/rdae007)
+
+  This paper introduces the imputation estimator implemented in our `ImputationDiD` class:
+  - **Efficient imputation**: OLS on untreated observations → impute counterfactuals → aggregate
+  - **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model
+  - **Pre-trend test**: Independent of treatment effect estimation (Proposition 9)
+  - **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects
+
 - **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)
 
 - **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -10,7 +10,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).
 
 diff-diff v2.1.1 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:
 
-- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Synthetic DiD, Triple Difference (DDD), TROP
+- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP
 - **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
 - **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
 - **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
@@ -24,15 +24,9 @@ diff-diff v2.1.1 is a **production-ready** DiD library with feature parity with
 
 High-value additions building on our existing foundation.
 
-### Borusyak-Jaravel-Spiess Imputation Estimator
+### ~~Borusyak-Jaravel-Spiess Imputation Estimator~~ ✅ Implemented (v2.2)
 
-More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation.
-
-- Imputes untreated potential outcomes using pre-treatment data
-- More efficient under homogeneous effects assumption
-- Can handle unbalanced panels more naturally
-
-**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*.
+Implemented as `ImputationDiD` — see `diff_diff/imputation.py`. Includes conservative variance (Theorem 3), event study and group aggregation, pre-trend test (Equation 9), multiplier bootstrap, and Proposition 5 handling for no never-treated units.
 
 ### Gardner's Two-Stage DiD (did2s)
 

diff --git a/benchmarks/R/benchmark_didimputation.R b/benchmarks/R/benchmark_didimputation.R
@@ -0,0 +1,160 @@
+#!/usr/bin/env Rscript
+# Benchmark: Imputation DiD Estimator (R `didimputation` package)
+#
+# Compares against diff_diff.ImputationDiD (Borusyak, Jaravel & Spiess 2024).
+#
+# Usage:
+#   Rscript benchmark_didimputation.R --data path/to/data.csv --output path/to/results.json
+
+library(didimputation)
+library(fixest)
+library(jsonlite)
+library(data.table)
+
+# Parse command line arguments
+args <- commandArgs(trailingOnly = TRUE)
+
+parse_args <- function(args) {
+  result <- list(
+    data = NULL,
+    output = NULL
+  )
+
+  i <- 1
+  while (i <= length(args)) {
+    if (args[i] == "--data") {
+      result$data <- args[i + 1]
+      i <- i + 2
+    } else if (args[i] == "--output") {
+      result$output <- args[i + 1]
+      i <- i + 2
+    } else {
+      i <- i + 1
+    }
+  }
+
+  if (is.null(result$data) || is.null(result$output)) {
+    stop("Usage: Rscript benchmark_didimputation.R --data <path> --output <path>")
+  }
+
+  return(result)
+}
+
+config <- parse_args(args)
+
+# Load data
+message(sprintf("Loading data from: %s", config$data))
+data <- fread(config$data)
+
+# Ensure proper column types
+data[, unit := as.integer(unit)]
+data[, time := as.integer(time)]
+
+# R's didimputation package expects first_treat=0 or NA for never-treated units
+# Our Python implementation uses first_treat=0 for never-treated, which matches
+data[, first_treat := as.integer(first_treat)]
+message(sprintf("Never-treated units (first_treat=0): %d", sum(data$first_treat == 0)))
+
+# Determine event study horizons from the data
+# Compute relative time for treated units
+treated_data <- data[first_treat > 0]
+if (nrow(treated_data) > 0) {
+  treated_data[, rel_time := time - first_treat]
+  min_horizon <- min(treated_data$rel_time)
+  max_horizon <- max(treated_data$rel_time)
+  # Post-treatment horizons only (for event study)
+  post_horizons <- sort(unique(treated_data$rel_time[treated_data$rel_time >= 0]))
+  all_horizons <- sort(unique(treated_data$rel_time))
+  message(sprintf("Horizon range: [%d, %d]", min_horizon, max_horizon))
+  message(sprintf("Post-treatment horizons: %s", paste(post_horizons, collapse = ", ")))
+}
+
+# Run benchmark - Overall ATT (static)
+message("Running did_imputation (static)...")
+start_time <- Sys.time()
+
+static_result <- did_imputation(
+  data = data,
+  yname = "outcome",
+  gname = "first_treat",
+  tname = "time",
+  idname = "unit",
+  cluster_var = "unit"
+)
+
+static_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
+message(sprintf("Static estimation completed in %.3f seconds", static_time))
+
+# Extract overall ATT
+overall_att <- static_result$estimate[1]
+overall_se <- static_result$std.error[1]
+message(sprintf("Overall ATT: %.6f (SE: %.6f)", overall_att, overall_se))
+
+# Run benchmark - Event study
+message("Running did_imputation (event study)...")
+es_start_time <- Sys.time()
+
+es_result <- did_imputation(
+  data = data,
+  yname = "outcome",
+  gname = "first_treat",
+  tname = "time",
+  idname = "unit",
+  horizon = TRUE,
+  cluster_var = "unit"
+)
+
+es_time <- as.numeric(difftime(Sys.time(), es_start_time, units = "secs"))
+message(sprintf("Event study estimation completed in %.3f seconds", es_time))
+
+total_time <- static_time + es_time
+
+# Format event study results
+event_study <- data.frame(
+  event_time = as.integer(gsub("tau", "", es_result$term)),
+  att = es_result$estimate,
+  se = es_result$std.error
+)
+
+message("Event study effects:")
+for (i in seq_len(nrow(event_study))) {
+  message(sprintf("  h=%d: ATT=%.4f (SE=%.4f)",
+    event_study$event_time[i],
+    event_study$att[i],
+    event_study$se[i]))
+}
+
+# Format output
+results <- list(
+  estimator = "didimputation::did_imputation",
+
+  # Overall ATT
+  overall_att = overall_att,
+  overall_se = overall_se,
+
+  # Event study
+  event_study = event_study,
+
+  # Timing
+  timing = list(
+    static_seconds = static_time,
+    event_study_seconds = es_time,
+    total_seconds = total_time
+  ),
+
+  # Metadata
+  metadata = list(
+    r_version = R.version.string,
+    didimputation_version = as.character(packageVersion("didimputation")),
+    n_units = length(unique(data$unit)),
+    n_periods = length(unique(data$time)),
+    n_obs = nrow(data)
+  )
+)
+
+# Write output
+message(sprintf("Writing results to: %s", config$output))
+dir.create(dirname(config$output), recursive = TRUE, showWarnings = FALSE)
+write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 10)
+
+message(sprintf("Completed in %.3f seconds", total_time))
diff --git a/benchmarks/R/requirements.R b/benchmarks/R/requirements.R
@@ -7,6 +7,7 @@
 required_packages <- c(
   # Core DiD packages
   "did",           # Callaway-Sant'Anna (2021) staggered DiD
+  "didimputation", # Borusyak, Jaravel & Spiess (2024) imputation DiD
   "HonestDiD",     # Rambachan & Roth (2023) sensitivity analysis
   "fixest",        # Fast TWFE and basic DiD