Factorial Difference-in-Differences (FDID) for MATLAB

This repository contains a high-performance, object-oriented MATLAB replication of the Factorial Difference-in-Differences (FDID) model, based strictly on the theoretical framework presented in:

Xu, Zhao, and Ding (2024). Factorial Difference-in-Differences. arXiv:2407.11937v2

Overview

The FDID framework provides a robust identification strategy for panel data where:

Universal Exposure: An event (the "Treatment") affects all units simultaneously (no clean control group).
Baseline Modulator: A baseline factor ($G$) modulates the impact of the event across different units.

This implementation accurately distinguishes between Effect Modification (what canonical DID recovers) and Causal Moderation (the true causal interaction of the baseline factor, identifiable under the Factorial Parallel Trends assumption).

Components

The replication is divided into four main files reflecting a professional econometric package:

MonteCarloSim.m: A rigorous Data Generating Process (DGP) simulator for FDID panel structures.
- Generates simulated datasets controlling effect modifications ($\tau_{em}$) vs. causal moderations ($\tau_{inter}$).
- Injects canonical and factorial parallel trends violations, heteroskedasticity, and AR(1) serial correlation.
FdidEstimator.m: A computationally efficient TWFE-based estimator.
- Utilizes Within-Transformation (absorbing unit and time fixed effects) for optimal performance on large $N$ panels, completely avoiding large sparse dummyvar matrices.
- Incorporates robust pinv solvers to handle TWFE collinearity.
- By default, calculates Unit-Level Cluster-Robust Standard Errors (CRSE) with finite-sample corrections.
run_monte_carlo.m: Evaluation script demonstrating the estimator's unbiasedness, RMSE, and properties under valid vs. invalid factorial assumptions across 200 replications.
run_empirical_fdid.m: Empirical application mock-up. Demonstrates the usage syntax on a proxy dataset modeling the "Clans and Calamity" Great Famine study.

Mathematical Framework

The FDID setup relies on a panel data structure where an event occurs at time $T_0$. All units are exposed to the event ($Z_{it}=1$ for $t \ge T_0$). The baseline factor $G_i \in {0, 1}$ divides the units into two groups.

The potential outcomes are modeled as:

$$ Y_{it}(g, z) = \alpha_i + \beta_t + \beta_{GZ} g z + \beta_{X} X_i z + \beta_{GZX} g X_i z + \epsilon_{it}(g, z) $$

The Two-Way Fixed Effects (TWFE) regression estimated by FdidEstimator is:

$$ Y_{it} = \mu_i + \lambda_t + \beta_{GZ} (G_i Z_{it}) + \beta_X (X_i Z_{it}) + \beta_{GZX} (G_i X_i Z_{it}) + \epsilon_{it} $$

Where:

$\mu_i$ and $\lambda_t$ are unit and time fixed effects.
$\beta_{GZ}$ captures the Causal Moderation ($\tau_{inter}$) if the Factorial Parallel Trends assumption holds and covariates are centered.
$\beta_{GZ}$ captures only the Effect Modification ($\tau_{em}$) if only canonical Parallel Trends holds.

Usage Guide

1. Generating Data with `MonteCarloSim`

The MonteCarloSim class provides an interface to generate synthetic FDID data.

% Initialize simulator
sim = MonteCarloSim();
sim.NumUnits = 1000;
sim.NumPeriods = 4;
sim.EventTime = 3;

% Configure parameters
sim.TauInter = 2.0;       % True causal moderation
sim.TauEm = 2.0;          % Set different from TauInter to violate Factorial PT
sim.HasHeteroskedasticity = true;

% Generate data
[data, trueParams] = sim.generate();

2. Estimating with `FdidEstimator`

The FdidEstimator fits the TWFE model on the panel data, using within-transformation to efficiently absorb fixed effects.

% Initialize the estimator
% Arguments: (data, idVar, timeVar, outcomeVar, baseFactorVar, exposureVar, covariatesList)
estimator = FdidEstimator(data, "id", "time", "y", "g", "z", ["x"]);

% Fit the model (calculates coefficients, robust standard errors, t-stats, p-values)
estimator = estimator.fit();

% Display the formatted results table
estimator.displayResults();

% Access specific coefficients or p-values programmatically
betaGZ = estimator.Coef.GZ;
pValGZ = estimator.PValue.GZ;

Empirical Application Example

% Set up paths and run the empirical mock script
run_empirical_fdid

Expected Output: Formatted regression tables showing the Causal Interaction Estimate (the GZ term).

Monte Carlo Simulation Example

% Run the Monte Carlo test (Note: takes a few seconds to process 200 iterations and draw KDE charts)
run_monte_carlo

Critical Considerations & Best Practices

When applying the FDID framework, researchers must exercise extreme caution regarding two core theoretical caveats identified in the literature:

1. The Factorial Parallel Trends Assumption

The TWFE regression (the GZ coefficient) mathematically converges to the Effect Modification ($\tau_{em}$). It only represents the Causal Moderation ($\tau_{inter}$) if the Factorial Parallel Trends assumption holds.

What this means: Under any hypothetical state of the world (e.g., if the event never occurred), the trajectory of outcomes for the $G=1$ group must precisely match the $G=0$ group.
Failure Consequence: If high-$G$ units and low-$G$ units have differing natural trajectories (due to historical or geographic advantages), the estimate contains severe Selection Bias, rending the causal interpretation invalid.

2. The Danger of Over-Controlling Covariates ($X$)

To rescue the Factorial Parallel Trends assumption, researchers often condition on baseline covariates $X$. However, indiscriminately adding covariates in an FDID TWFE framework can be disastrous:

Efficiency Loss & Multicollinearity: Every added covariate $X$ requires controlling for $X \times Z$ and ideally the three-way interaction $G \times X \times Z$. This exponentially consumes degrees of freedom. Highly correlated covariates will push the design matrix $(X'X)$ towards singularity, exploding standard errors and causing computational instability.
Bad Controls (Colliders & Mediators): Never control for "post-treatment" variables (factors that could themselves be affected by the event $Z$). Doing so opens endogenous pathways (Collider Bias) or absorbs the mechanism of action (Over-control Bias), thoroughly polluting both $\tau_{inter}$ and $\tau_{em}$.
Best Practice: Only include strictly pre-determined, essential confounders that theoretically dictate both the $G$ group assignment and the underlying time trend. Furthermore, FdidEstimator automatically centers ($X_i - \bar{X}$) all provided covariates to explicitly ensure the main $\beta_{GZ}$ term anchors to the sample average causal interaction.

Setup & Requirements

Environment: MATLAB R2017a or higher (uses string arrays).
Dependencies: Uses basic Statistics and Machine Learning Toolbox functions (normrnd, ksdensity, etc.).

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
2407.11937v2.pdf		2407.11937v2.pdf
FdidEstimator.m		FdidEstimator.m
LICENSE		LICENSE
MonteCarloSim.m		MonteCarloSim.m
README.md		README.md
run_empirical_fdid.m		run_empirical_fdid.m
run_monte_carlo.m		run_monte_carlo.m

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Factorial Difference-in-Differences (FDID) for MATLAB

Overview

Components

Mathematical Framework

Usage Guide

1. Generating Data with `MonteCarloSim`

2. Estimating with `FdidEstimator`

Empirical Application Example

Monte Carlo Simulation Example

Critical Considerations & Best Practices

1. The Factorial Parallel Trends Assumption

2. The Danger of Over-Controlling Covariates ($X$)

Setup & Requirements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

Pub-Craig-Researchs/FDiD

Folders and files

Latest commit

History

Repository files navigation

Factorial Difference-in-Differences (FDID) for MATLAB

Overview

Components

Mathematical Framework

Usage Guide

1. Generating Data with MonteCarloSim

2. Estimating with FdidEstimator

Empirical Application Example

Monte Carlo Simulation Example

Critical Considerations & Best Practices

1. The Factorial Parallel Trends Assumption

2. The Danger of Over-Controlling Covariates ($X$)

Setup & Requirements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Generating Data with `MonteCarloSim`

2. Estimating with `FdidEstimator`

Packages