
🔎 What This Paper Shows
Propensity score matching (PSM), a widely used preprocessing tool for causal inference, frequently does the opposite of its intended goal: it can increase imbalance, reduce efficiency, heighten model dependence, and introduce bias.
📊 Why PSM Fails
- The core problem is methodological: PSM attempts to approximate a completely randomized experiment. Other matching methods instead approximate a fully blocked randomized experiment, which is typically more efficient.
- Because PSM targets complete randomization, it is uniquely blind to the large portion of covariate imbalance that can be removed by approximating full blocking with alternative matching approaches.
✅ Key Findings
- PSM often increases imbalance rather than reducing it.
- PSM can worsen statistical efficiency and increase reliance on outcome-modeling (model dependence).
- In some datasets that are already balanced enough to resemble complete randomization—either originally or after pruning—PSM behaves like random matching and can increase imbalance even relative to the raw data.
🔍 What This Means for Practice
- These results indicate that researchers should prefer other matching methods that approximate full blocking when the goal is to reduce imbalance and improve causal estimates.
- Propensity scores are not without value, however; they still have productive uses outside the specific role of matching for approximating complete randomization.
📌 Takeaway
Rethinking the default use of PSM is crucial: matching strategies that target blocked designs typically deliver better balance and more reliable causal inferences than propensity score matching.