
Merging large administrative data sets is challenging due to missing identifiers and inaccurate records. This paper introduces a new algorithm for probabilistic record linkage that handles these issues efficiently at scale.
Data & Methods: We developed a fast, scalable algorithm using a canonical probabilistic model approach. The method accommodates millions of observations while accounting for:
Simulation Studies: Our algorithm was tested extensively through realistic scenarios to ensure reliability.
Real Applications: Case studies demonstrate its use in merging campaign contribution records, survey datasets, and voter files. An open-source implementation is available for researchers.

| Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records was authored by Ted Enamorado, Benjamin Fifield and Kosuke Imai. It was published by Cambridge in APSR in 2019. |
