Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,16 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
- Alternative to Callaway-Sant'Anna with different weighting scheme
- Useful robustness check when both estimators agree

- **`diff_diff/imputation.py`** - Borusyak-Jaravel-Spiess imputation DiD estimator:
- `ImputationDiD` - Borusyak et al. (2024) efficient imputation estimator for staggered DiD
- `ImputationDiDResults` - Results with overall ATT, event study, group effects, pre-trend test
- `ImputationBootstrapResults` - Multiplier bootstrap inference results
- `imputation_did()` - Convenience function
- Steps: (1) OLS on untreated obs for unit+time FE, (2) impute counterfactual Y(0), (3) aggregate
- Conservative variance (Theorem 3) with `aux_partition` parameter for SE tightness
- Pre-trend test (Equation 9) via `results.pretrend_test()`
- Proposition 5: NaN for unidentified long-run horizons without never-treated units

- **`diff_diff/triple_diff.py`** - Triple Difference (DDD) estimator:
- `TripleDifference` - Ortiz-Villavicencio & Sant'Anna (2025) estimator for DDD designs
- `TripleDifferenceResults` - Results with ATT, SEs, cell means, diagnostics
Expand Down Expand Up @@ -255,6 +265,7 @@ cross-platform compilation - no OpenBLAS or Intel MKL installation required.
Standalone estimators (each has own get_params/set_params):
├── CallawaySantAnna
├── SunAbraham
├── ImputationDiD
├── TripleDifference
├── TROP
├── SyntheticDiD
Expand Down Expand Up @@ -364,6 +375,7 @@ Tests mirror the source modules:
- `tests/test_estimators.py` - Tests for DifferenceInDifferences, TWFE, MultiPeriodDiD, SyntheticDiD
- `tests/test_staggered.py` - Tests for CallawaySantAnna
- `tests/test_sun_abraham.py` - Tests for SunAbraham interaction-weighted estimator
- `tests/test_imputation.py` - Tests for ImputationDiD (Borusyak et al. 2024) estimator
- `tests/test_triple_diff.py` - Tests for Triple Difference (DDD) estimator
- `tests/test_trop.py` - Tests for Triply Robust Panel (TROP) estimator
- `tests/test_bacon.py` - Tests for Goodman-Bacon decomposition
Expand Down
112 changes: 111 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ Signif. codes: '***' 0.001, '**' 0.01, '*' 0.05, '.' 0.1
- **Wild cluster bootstrap**: Valid inference with few clusters (<50) using Rademacher, Webb, or Mammen weights
- **Panel data support**: Two-way fixed effects estimator for panel designs
- **Multi-period analysis**: Event-study style DiD with period-specific treatment effects
- **Staggered adoption**: Callaway-Sant'Anna (2021) and Sun-Abraham (2021) estimators for heterogeneous treatment timing
- **Staggered adoption**: Callaway-Sant'Anna (2021), Sun-Abraham (2021), and Borusyak-Jaravel-Spiess (2024) imputation estimators for heterogeneous treatment timing
- **Triple Difference (DDD)**: Ortiz-Villavicencio & Sant'Anna (2025) estimators with proper covariate handling
- **Synthetic DiD**: Combined DiD with synthetic control for improved robustness
- **Triply Robust Panel (TROP)**: Factor-adjusted DiD with synthetic weights (Athey et al. 2025)
Expand Down Expand Up @@ -879,6 +879,54 @@ print(f"Sun-Abraham ATT: {sa_results.overall_att:.3f}")
# If results differ substantially, investigate heterogeneity
```

### Borusyak-Jaravel-Spiess Imputation Estimator

The Borusyak et al. (2024) imputation estimator is the **efficient** estimator for staggered DiD under parallel trends, producing ~50% shorter confidence intervals than Callaway-Sant'Anna and 2-3.5x shorter than Sun-Abraham under homogeneous treatment effects.

```python
from diff_diff import ImputationDiD, imputation_did

# Basic usage
est = ImputationDiD()
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat')
results.print_summary()

# Event study
results = est.fit(data, outcome='outcome', unit='unit',
time='period', first_treat='first_treat',
aggregate='event_study')

# Pre-trend test (Equation 9)
pt = results.pretrend_test(n_leads=3)
print(f"F-stat: {pt['f_stat']:.3f}, p-value: {pt['p_value']:.4f}")

# Convenience function
results = imputation_did(data, 'outcome', 'unit', 'period', 'first_treat',
aggregate='all')
```

```python
ImputationDiD(
anticipation=0, # Number of anticipation periods
alpha=0.05, # Significance level
cluster=None, # Cluster variable (defaults to unit)
n_bootstrap=0, # Bootstrap iterations (0=analytical inference)
seed=None, # Random seed
horizon_max=None, # Max event-study horizon
aux_partition="cohort_horizon", # Variance partition: "cohort_horizon", "cohort", "horizon"
)
```

**When to use Imputation DiD vs Callaway-Sant'Anna:**

| Aspect | Imputation DiD | Callaway-Sant'Anna |
|--------|---------------|-------------------|
| Efficiency | Most efficient under homogeneous effects | Less efficient but more robust to heterogeneity |
| Control group | Always uses all untreated obs | Choice of never-treated or not-yet-treated |
| Inference | Conservative variance (Theorem 3) | Multiplier bootstrap |
| Pre-trends | Built-in F-test (Equation 9) | Separate testing |

### Triple Difference (DDD)

Triple Difference (DDD) is used when treatment requires satisfying two criteria: belonging to a treated **group** AND being in an eligible **partition**. The `TripleDifference` class implements the methodology from Ortiz-Villavicencio & Sant'Anna (2025), which correctly handles covariate adjustment (unlike naive implementations).
Expand Down Expand Up @@ -2000,6 +2048,60 @@ SunAbraham(
| `print_summary(alpha)` | Print summary to stdout |
| `to_dataframe(level)` | Convert to DataFrame ('event_study' or 'cohort') |

### ImputationDiD

```python
ImputationDiD(
anticipation=0, # Periods of anticipation effects
alpha=0.05, # Significance level for CIs
cluster=None, # Column for cluster-robust SEs
n_bootstrap=0, # Bootstrap iterations (0 = analytical)
seed=None, # Random seed
rank_deficient_action='warn', # 'warn', 'error', or 'silent'
horizon_max=None, # Max event-study horizon
aux_partition='cohort_horizon', # Variance partition
)
```

**fit() Parameters:**

| Parameter | Type | Description |
|-----------|------|-------------|
| `data` | DataFrame | Panel data |
| `outcome` | str | Outcome variable column name |
| `unit` | str | Unit identifier column |
| `time` | str | Time period column |
| `first_treat` | str | First treatment period column (0 for never-treated) |
| `covariates` | list | Covariate column names |
| `aggregate` | str | Aggregation: None, "event_study", "group", "all" |
| `balance_e` | int | Balance event study to this many pre-treatment periods |

### ImputationDiDResults

**Attributes:**

| Attribute | Description |
|-----------|-------------|
| `overall_att` | Overall average treatment effect on the treated |
| `overall_se` | Standard error (conservative, Theorem 3) |
| `overall_t_stat` | T-statistic |
| `overall_p_value` | P-value for H0: ATT = 0 |
| `overall_conf_int` | Confidence interval |
| `event_study_effects` | Dict of relative time -> effect dict (if `aggregate='event_study'` or `'all'`) |
| `group_effects` | Dict of cohort -> effect dict (if `aggregate='group'` or `'all'`) |
| `treatment_effects` | DataFrame of unit-level imputed treatment effects |
| `n_treated_obs` | Number of treated observations |
| `n_untreated_obs` | Number of untreated observations |

**Methods:**

| Method | Description |
|--------|-------------|
| `summary(alpha)` | Get formatted summary string |
| `print_summary(alpha)` | Print summary to stdout |
| `to_dataframe(level)` | Convert to DataFrame ('observation', 'event_study', 'group') |
| `pretrend_test(n_leads)` | Run pre-trend F-test (Equation 9) |

### TripleDifference

```python
Expand Down Expand Up @@ -2464,6 +2566,14 @@ The `HonestDiD` module implements sensitivity analysis methods for relaxing the

### Multi-Period and Staggered Adoption

- **Borusyak, K., Jaravel, X., & Spiess, J. (2024).** "Revisiting Event-Study Designs: Robust and Efficient Estimation." *Review of Economic Studies*, 91(6), 3253-3285. [https://doi.org/10.1093/restud/rdae007](https://doi.org/10.1093/restud/rdae007)

This paper introduces the imputation estimator implemented in our `ImputationDiD` class:
- **Efficient imputation**: OLS on untreated observations → impute counterfactuals → aggregate
- **Conservative variance**: Theorem 3 clustered variance estimator with auxiliary model
- **Pre-trend test**: Independent of treatment effect estimation (Proposition 9)
- **Efficiency gains**: ~50% shorter CIs than Callaway-Sant'Anna under homogeneous effects

- **Callaway, B., & Sant'Anna, P. H. C. (2021).** "Difference-in-Differences with Multiple Time Periods." *Journal of Econometrics*, 225(2), 200-230. [https://doi.org/10.1016/j.jeconom.2020.12.001](https://doi.org/10.1016/j.jeconom.2020.12.001)

- **Sant'Anna, P. H. C., & Zhao, J. (2020).** "Doubly Robust Difference-in-Differences Estimators." *Journal of Econometrics*, 219(1), 101-122. [https://doi.org/10.1016/j.jeconom.2020.06.003](https://doi.org/10.1016/j.jeconom.2020.06.003)
Expand Down
12 changes: 3 additions & 9 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ For past changes and release history, see [CHANGELOG.md](CHANGELOG.md).

diff-diff v2.1.1 is a **production-ready** DiD library with feature parity with R's `did` + `HonestDiD` + `synthdid` ecosystem for core DiD analysis:

- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Synthetic DiD, Triple Difference (DDD), TROP
- **Core estimators**: Basic DiD, TWFE, MultiPeriod, Callaway-Sant'Anna, Sun-Abraham, Borusyak-Jaravel-Spiess Imputation, Synthetic DiD, Triple Difference (DDD), TROP
- **Valid inference**: Robust SEs, cluster SEs, wild bootstrap, multiplier bootstrap, placebo-based variance
- **Assumption diagnostics**: Parallel trends tests, placebo tests, Goodman-Bacon decomposition
- **Sensitivity analysis**: Honest DiD (Rambachan-Roth), Pre-trends power analysis (Roth 2022)
Expand All @@ -24,15 +24,9 @@ diff-diff v2.1.1 is a **production-ready** DiD library with feature parity with

High-value additions building on our existing foundation.

### Borusyak-Jaravel-Spiess Imputation Estimator
### ~~Borusyak-Jaravel-Spiess Imputation Estimator~~ ✅ Implemented (v2.2)

More efficient than Callaway-Sant'Anna when treatment effects are homogeneous across groups/time. Uses imputation rather than aggregation.

- Imputes untreated potential outcomes using pre-treatment data
- More efficient under homogeneous effects assumption
- Can handle unbalanced panels more naturally

**Reference**: Borusyak, Jaravel, and Spiess (2024). *Review of Economic Studies*.
Implemented as `ImputationDiD` — see `diff_diff/imputation.py`. Includes conservative variance (Theorem 3), event study and group aggregation, pre-trend test (Equation 9), multiplier bootstrap, and Proposition 5 handling for no never-treated units.

### Gardner's Two-Stage DiD (did2s)

Expand Down
160 changes: 160 additions & 0 deletions benchmarks/R/benchmark_didimputation.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
#!/usr/bin/env Rscript
# Benchmark: Imputation DiD Estimator (R `didimputation` package)
#
# Compares against diff_diff.ImputationDiD (Borusyak, Jaravel & Spiess 2024).
#
# Usage:
# Rscript benchmark_didimputation.R --data path/to/data.csv --output path/to/results.json

library(didimputation)
library(fixest)
library(jsonlite)
library(data.table)

# Parse command line arguments
args <- commandArgs(trailingOnly = TRUE)

parse_args <- function(args) {
result <- list(
data = NULL,
output = NULL
)

i <- 1
while (i <= length(args)) {
if (args[i] == "--data") {
result$data <- args[i + 1]
i <- i + 2
} else if (args[i] == "--output") {
result$output <- args[i + 1]
i <- i + 2
} else {
i <- i + 1
}
}

if (is.null(result$data) || is.null(result$output)) {
stop("Usage: Rscript benchmark_didimputation.R --data <path> --output <path>")
}

return(result)
}

config <- parse_args(args)

# Load data
message(sprintf("Loading data from: %s", config$data))
data <- fread(config$data)

# Ensure proper column types
data[, unit := as.integer(unit)]
data[, time := as.integer(time)]

# R's didimputation package expects first_treat=0 or NA for never-treated units
# Our Python implementation uses first_treat=0 for never-treated, which matches
data[, first_treat := as.integer(first_treat)]
message(sprintf("Never-treated units (first_treat=0): %d", sum(data$first_treat == 0)))

# Determine event study horizons from the data
# Compute relative time for treated units
treated_data <- data[first_treat > 0]
if (nrow(treated_data) > 0) {
treated_data[, rel_time := time - first_treat]
min_horizon <- min(treated_data$rel_time)
max_horizon <- max(treated_data$rel_time)
# Post-treatment horizons only (for event study)
post_horizons <- sort(unique(treated_data$rel_time[treated_data$rel_time >= 0]))
all_horizons <- sort(unique(treated_data$rel_time))
message(sprintf("Horizon range: [%d, %d]", min_horizon, max_horizon))
message(sprintf("Post-treatment horizons: %s", paste(post_horizons, collapse = ", ")))
}

# Run benchmark - Overall ATT (static)
message("Running did_imputation (static)...")
start_time <- Sys.time()

static_result <- did_imputation(
data = data,
yname = "outcome",
gname = "first_treat",
tname = "time",
idname = "unit",
cluster_var = "unit"
)

static_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))
message(sprintf("Static estimation completed in %.3f seconds", static_time))

# Extract overall ATT
overall_att <- static_result$estimate[1]
overall_se <- static_result$std.error[1]
message(sprintf("Overall ATT: %.6f (SE: %.6f)", overall_att, overall_se))

# Run benchmark - Event study
message("Running did_imputation (event study)...")
es_start_time <- Sys.time()

es_result <- did_imputation(
data = data,
yname = "outcome",
gname = "first_treat",
tname = "time",
idname = "unit",
horizon = TRUE,
cluster_var = "unit"
)

es_time <- as.numeric(difftime(Sys.time(), es_start_time, units = "secs"))
message(sprintf("Event study estimation completed in %.3f seconds", es_time))

total_time <- static_time + es_time

# Format event study results
event_study <- data.frame(
event_time = as.integer(gsub("tau", "", es_result$term)),
att = es_result$estimate,
se = es_result$std.error
)

message("Event study effects:")
for (i in seq_len(nrow(event_study))) {
message(sprintf(" h=%d: ATT=%.4f (SE=%.4f)",
event_study$event_time[i],
event_study$att[i],
event_study$se[i]))
}

# Format output
results <- list(
estimator = "didimputation::did_imputation",

# Overall ATT
overall_att = overall_att,
overall_se = overall_se,

# Event study
event_study = event_study,

# Timing
timing = list(
static_seconds = static_time,
event_study_seconds = es_time,
total_seconds = total_time
),

# Metadata
metadata = list(
r_version = R.version.string,
didimputation_version = as.character(packageVersion("didimputation")),
n_units = length(unique(data$unit)),
n_periods = length(unique(data$time)),
n_obs = nrow(data)
)
)

# Write output
message(sprintf("Writing results to: %s", config$output))
dir.create(dirname(config$output), recursive = TRUE, showWarnings = FALSE)
write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 10)

message(sprintf("Completed in %.3f seconds", total_time))
1 change: 1 addition & 0 deletions benchmarks/R/requirements.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
required_packages <- c(
# Core DiD packages
"did", # Callaway-Sant'Anna (2021) staggered DiD
"didimputation", # Borusyak, Jaravel & Spiess (2024) imputation DiD
"HonestDiD", # Rambachan & Roth (2023) sensitivity analysis
"fixest", # Fast TWFE and basic DiD

Expand Down
Loading