FIND DATA: By Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | IR | Law & Courts🎵
   FIND DATA: By Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts🎵
WHAT'S NEW? US Politics | IR | Law & Courts🎵
If this link is broken, please report as broken. You can also submit updates (will be reviewed).

Probabilistic Model Streamlines Large-Scale Data Merging

probabilistic modelrecord linkageAdministrative Datalarge-scale mergingMethodology@APSRDataverse
Methodology subfield banner

Merging large administrative data sets is challenging due to missing identifiers and inaccurate records. This paper introduces a new algorithm for probabilistic record linkage that handles these issues efficiently at scale.

Data & Methods: We developed a fast, scalable algorithm using a canonical probabilistic model approach. The method accommodates millions of observations while accounting for:

  • Missing data
  • Measurement error
  • Auxiliary information integration
  • Uncertainty adjustment in post-analysis

Simulation Studies: Our algorithm was tested extensively through realistic scenarios to ensure reliability.

Real Applications: Case studies demonstrate its use in merging campaign contribution records, survey datasets, and voter files. An open-source implementation is available for researchers.

Article card for article: Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records
Using a Probabilistic Model to Assist Merging of Large-scale Administrative Records was authored by Ted Enamorado, Benjamin Fifield and Kosuke Imai. It was published by Cambridge in APSR in 2019.
Find on Google Scholar
Find on JSTOR
Find on CUP
American Political Science Review
Edit article record marker