
🔎 What Was Investigated
Matching for causal inference is well understood for low-dimensional data, but standard approaches break down for text documents. High dimensionality makes exact matching infeasible, propensity scores produce incomparable matches, and assessing match quality becomes difficult. The study frames text matching as two design choices: the choice of text representation and the choice of distance metric, and asks how these choices affect both the quantity and quality of matches.
🧪 How Methods Were Compared
A systematic multifactor evaluation experiment using human subjects was used to compare text-matching procedures. Key features of the evaluation:
📈 Key Findings
🧩 Demonstrations of Use
Two applications illustrate practical benefits of the identified best method:
🔚 Why This Matters
The work provides a practical framework for matching documents by separating representation and distance choices, identifies methods that improve subjective match quality, and offers a predictive tool to approximate human match judgments—making text-based causal inference more reliable and easier to evaluate without extensive manual labeling.

| Matching With Text Data: An Experimental Evaluation of Methods for Matching Documents and of Measuring Match Quality was authored by Reagan Mozer, Luke Miratrix, Aaron Russell Kaufman and L. Jason Anastasopoulos. It was published by Cambridge in Pol. An. in 2020. |
