Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 59 additions & 4 deletions METHODOLOGY_REVIEW.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,14 +153,69 @@ Each estimator in diff-diff should be periodically reviewed to ensure:
| Module | `twfe.py` |
| Primary Reference | Wooldridge (2010), Ch. 10 |
| R Reference | `fixest::feols()` |
| Status | Not Started |
| Last Review | - |
| Status | **Complete** |
| Last Review | 2026-02-08 |

**Verified Components:**
- [x] Within-transformation algebra: `y_it - ȳ_i - ȳ_t + ȳ` matches hand calculation (rtol=1e-12)
- [x] ATT matches manual demeaned OLS (rtol=1e-10)
- [x] ATT matches `DifferenceInDifferences` on 2-period data (rtol=1e-10)
- [x] Covariates are also within-transformed (sum to zero within unit/time groups)
- [x] R comparison: ATT matches `fixest::feols(y ~ treated:post | unit + post, cluster=~unit)` (rtol<0.1%)
- [x] R comparison: Cluster-robust SE match (rtol<1%)
- [x] R comparison: P-value match (atol<0.01)
- [x] R comparison: CI bounds match (rtol<1%)
- [x] R comparison: ATT and SE match with covariate (same tolerances)
- [x] Edge case: Staggered treatment triggers `UserWarning`
- [x] Edge case: Auto-clusters at unit level (SE matches explicit `cluster="unit"`)
- [x] Edge case: DF adjustment for absorbed FE matches manual `solve_ols()` with `df_adjustment`
- [x] Edge case: Covariate collinear with interaction raises `ValueError` ("cannot be identified")
- [x] Edge case: Covariate collinearity warns but ATT remains finite
- [x] Edge case: `rank_deficient_action="error"` raises `ValueError`
- [x] Edge case: `rank_deficient_action="silent"` emits no warnings
- [x] Edge case: Unbalanced panel produces valid results (finite ATT, positive SE)
- [x] Edge case: Missing unit column raises `ValueError`
- [x] Integration: `decompose()` returns `BaconDecompositionResults`
- [x] SE: Cluster-robust SE >= HC1 SE
- [x] SE: VCoV positive semi-definite
- [x] Wild bootstrap: Valid inference (finite SE, p-value in [0,1])
- [x] Wild bootstrap: All weight types (rademacher, mammen, webb) produce valid inference
- [x] Wild bootstrap: `inference="wild_bootstrap"` routes correctly
- [x] Params: `get_params()` returns all inherited parameters
- [x] Params: `set_params()` modifies attributes
- [x] Results: `summary()` contains "ATT"
- [x] Results: `to_dict()` contains att, se, t_stat, p_value, n_obs
- [x] Results: residuals + fitted = demeaned outcome (not raw)
- [x] Edge case: Multi-period time emits UserWarning advising binary post indicator
- [x] Edge case: Non-{0,1} binary time emits UserWarning (ATT still correct)
- [x] Edge case: ATT invariant to time encoding ({0,1} vs {2020,2021} produces identical results)

**Key Implementation Detail:**
The interaction term `D_i × Post_t` must be within-transformed (demeaned) alongside the outcome,
consistent with the Frisch-Waugh-Lovell (FWL) theorem: all regressors and the outcome must be
projected out of the fixed effects space. R's `fixest::feols()` does this automatically when
variables appear to the left of the `|` separator.

**Corrections Made:**
- (None yet)
- **Bug fix: interaction term must be within-transformed** (found during review). The previous
implementation used raw (un-demeaned) `D_i × Post_t` in the demeaned regression. This gave
correct results only for 2-period panels where `post == period`. For multi-period panels
(e.g., 4 periods with binary `post`), the raw interaction had incorrect correlation with
demeaned Y, producing ATT approximately 1/3 of the true value. Fixed by applying the same
within-transformation to the interaction term before regression. This matches R's
`fixest::feols()` behavior. (`twfe.py` lines 99-113)

**Outstanding Concerns:**
- (None yet)
- **Multi-period `time` parameter**: Multi-period time values (e.g., 1,2,3,4) produce
`treated × period_number` instead of `treated × post_indicator`, which is not the standard
D_it treatment indicator. A `UserWarning` is emitted when `time` has >2 unique values.
For binary time with non-{0,1} values (e.g., {2020, 2021}), the ATT is mathematically
correct (the within-transformation absorbs the scaling), but a warning recommends 0/1
encoding for clarity. Users with multi-period data should create a binary `post` column.
- **Staggered treatment warning**: The warning only fires when `time` has >2 unique values
(i.e., actual period numbers). With binary `time="post"`, all treated units appear to start
treatment at `time=1`, making staggering undetectable. Users with staggered designs should
use `decompose()` or `CallawaySantAnna` directly for proper diagnostics.

---

Expand Down
132 changes: 132 additions & 0 deletions benchmarks/R/benchmark_twfe.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,132 @@
#!/usr/bin/env Rscript
# Benchmark: Two-Way Fixed Effects (R `fixest` package with absorbed FE)
#
# This uses fixest::feols() with absorbed unit + post FE and unit-level clustering,
# matching the Python TwoWayFixedEffects estimator's approach.
#
# Usage:
# Rscript benchmark_twfe.R --data path/to/data.csv --output path/to/results.json

library(fixest)
library(jsonlite)
library(data.table)

# Parse command line arguments
args <- commandArgs(trailingOnly = TRUE)

parse_args <- function(args) {
result <- list(
data = NULL,
output = NULL
)

i <- 1
while (i <= length(args)) {
if (args[i] == "--data") {
result$data <- args[i + 1]
i <- i + 2
} else if (args[i] == "--output") {
result$output <- args[i + 1]
i <- i + 2
} else {
i <- i + 1
}
}

if (is.null(result$data) || is.null(result$output)) {
stop("Usage: Rscript benchmark_twfe.R --data <path> --output <path>")
}

return(result)
}

config <- parse_args(args)

# Load data
message(sprintf("Loading data from: %s", config$data))
data <- fread(config$data)

# Run benchmark
message("Running TWFE estimation with absorbed FE...")
start_time <- Sys.time()

# TWFE with absorbed unit + post fixed effects, clustered at unit level
# This matches Python's TwoWayFixedEffects:
# - Within-transformation removes unit and time (post) FE
# - Cluster-robust SE at unit level (automatic)
model <- feols(
outcome ~ treated:post | unit + post,
data = data,
cluster = ~unit
)

estimation_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs"))

# Extract results
coef_name <- "treated:post"
coefs <- coef(model)
ses <- se(model)
pvals <- pvalue(model)
ci <- confint(model)

# Find the treatment effect coefficient
if (coef_name %in% names(coefs)) {
att <- coefs[coef_name]
att_se <- ses[coef_name]
att_pval <- pvals[coef_name]
att_ci <- ci[coef_name, ]
} else {
# Try alternative name formats
idx <- grep("treated.*post|post.*treated", names(coefs))
if (length(idx) > 0) {
att <- coefs[idx[1]]
att_se <- ses[idx[1]]
att_pval <- pvals[idx[1]]
att_ci <- ci[idx[1], ]
coef_name <- names(coefs)[idx[1]]
} else {
stop("Could not find treatment effect coefficient")
}
}

# Format output
results <- list(
estimator = "fixest::feols (absorbed FE)",
cluster = "unit",

# Treatment effect
att = unname(att),
se = unname(att_se),
pvalue = unname(att_pval),
ci_lower = unname(att_ci[1]),
ci_upper = unname(att_ci[2]),
coef_name = coef_name,

# Model statistics
model_stats = list(
r_squared = summary(model)$r2,
adj_r_squared = summary(model)$adj.r2,
n_obs = model$nobs
),

# Timing
timing = list(
estimation_seconds = estimation_time,
total_seconds = estimation_time
),

# Metadata
metadata = list(
r_version = R.version.string,
fixest_version = as.character(packageVersion("fixest")),
n_units = length(unique(data$unit)),
n_periods = length(unique(data$post)),
n_obs = nrow(data)
)
)

# Write output
message(sprintf("Writing results to: %s", config$output))
write_json(results, config$output, auto_unbox = TRUE, pretty = TRUE, digits = 15)

message(sprintf("Completed in %.3f seconds", estimation_time))
129 changes: 129 additions & 0 deletions benchmarks/python/benchmark_twfe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
#!/usr/bin/env python3
"""
Benchmark: TwoWayFixedEffects (diff-diff TwoWayFixedEffects class).

This benchmarks the actual TwoWayFixedEffects class with within-transformation,
as opposed to benchmark_basic.py which uses DifferenceInDifferences with formula.

Usage:
python benchmark_twfe.py --data path/to/data.csv --output path/to/results.json
"""

import argparse
import json
import os
import sys
from pathlib import Path

# IMPORTANT: Parse --backend and set environment variable BEFORE importing diff_diff
def _get_backend_from_args():
"""Parse --backend argument without importing diff_diff."""
parser = argparse.ArgumentParser(add_help=False)
parser.add_argument("--backend", default="auto", choices=["auto", "python", "rust"])
args, _ = parser.parse_known_args()
return args.backend

_requested_backend = _get_backend_from_args()
if _requested_backend in ("python", "rust"):
os.environ["DIFF_DIFF_BACKEND"] = _requested_backend

# NOW import diff_diff and other dependencies (will see the env var)
import pandas as pd

# Add parent to path for imports
sys.path.insert(0, str(Path(__file__).parent.parent.parent))

from diff_diff import TwoWayFixedEffects, HAS_RUST_BACKEND
from benchmarks.python.utils import Timer


def parse_args():
parser = argparse.ArgumentParser(description="Benchmark TwoWayFixedEffects estimator")
parser.add_argument("--data", required=True, help="Path to input CSV data")
parser.add_argument("--output", required=True, help="Path to output JSON results")
parser.add_argument(
"--backend", default="auto", choices=["auto", "python", "rust"],
help="Backend to use: auto (default), python (pure Python), rust (Rust backend)"
)
return parser.parse_args()


def get_actual_backend() -> str:
"""Return the actual backend being used based on HAS_RUST_BACKEND."""
return "rust" if HAS_RUST_BACKEND else "python"


def main():
args = parse_args()

actual_backend = get_actual_backend()
print(f"Using backend: {actual_backend}")

# Load data
print(f"Loading data from: {args.data}")
data = pd.read_csv(args.data)

# Run benchmark using TwoWayFixedEffects (within-transformation approach)
print("Running TWFE estimation...")

twfe = TwoWayFixedEffects(robust=True) # auto-clusters at unit level

with Timer() as timer:
results = twfe.fit(
data,
outcome="outcome",
treatment="treated",
time="post",
unit="unit",
)

att = results.att
se = results.se
pvalue = results.p_value
ci = results.conf_int

total_time = timer.elapsed

# Build output
output = {
"estimator": "diff_diff.TwoWayFixedEffects",
"backend": actual_backend,
"cluster": "unit",
# Treatment effect
"att": float(att),
"se": float(se),
"pvalue": float(pvalue),
"ci_lower": float(ci[0]),
"ci_upper": float(ci[1]),
# Model statistics
"model_stats": {
"n_obs": len(data),
"n_units": len(data["unit"].unique()),
"n_periods": len(data["post"].unique()),
},
# Timing
"timing": {
"estimation_seconds": total_time,
"total_seconds": total_time,
},
# Metadata
"metadata": {
"n_units": len(data["unit"].unique()),
"n_periods": len(data["post"].unique()),
"n_obs": len(data),
},
}

# Write output
print(f"Writing results to: {args.output}")
output_path = Path(args.output)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, "w") as f:
json.dump(output, f, indent=2)

print(f"Completed in {total_time:.3f} seconds")
return output


if __name__ == "__main__":
main()
Loading