When Propensity Score Matching Hurts: Why Better Matching Beats PSM

Insights from the Field

Propensity scores

Matching

Blocking

Causal inference

Bias

Why Propensity Scores Should Not Be Used for Matching was authored by Richard Nielsen and Gary King. It was published by Cambridge in Pol. An. in 2019.

🔎 What This Paper Shows

Propensity score matching (PSM), a widely used preprocessing tool for causal inference, frequently does the opposite of its intended goal: it can increase imbalance, reduce efficiency, heighten model dependence, and introduce bias.

📊 Why PSM Fails

The core problem is methodological: PSM attempts to approximate a completely randomized experiment. Other matching methods instead approximate a fully blocked randomized experiment, which is typically more efficient.
Because PSM targets complete randomization, it is uniquely blind to the large portion of covariate imbalance that can be removed by approximating full blocking with alternative matching approaches.

✅ Key Findings

PSM often increases imbalance rather than reducing it.
PSM can worsen statistical efficiency and increase reliance on outcome-modeling (model dependence).
In some datasets that are already balanced enough to resemble complete randomization—either originally or after pruning—PSM behaves like random matching and can increase imbalance even relative to the raw data.

🔍 What This Means for Practice

These results indicate that researchers should prefer other matching methods that approximate full blocking when the goal is to reduce imbalance and improve causal estimates.
Propensity scores are not without value, however; they still have productive uses outside the specific role of matching for approximating complete randomization.

📌 Takeaway

Rethinking the default use of PSM is crucial: matching strategies that target blocked designs typically deliver better balance and more reliable causal inferences than propensity score matching.