FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Insights from the Field

ZIP Codes Work for White/Black Imputation; Blocks Matter for Asian and Hispanic


BISG
Geocoding
ZIP codes
Census blocks
Georgia
Methodology
Pol. An.
2 R files
7 datasets
23 other files
1 text files
Dataverse
Minmaxing of Bayesian Improved Surname and Geography Level Ups in Predicting Race was authored by Jesse Clark, John Curiel and Tyler Steelman. It was published by Cambridge in Pol. An. in 2022.

📌 Overview

Racial identification often must be inferred from ecological data, a process that is vulnerable to bias and error. Bayesian Improved Surname Geocoding (BISG) greatly improves those inferences by combining surname and geographic demographic data, but the geographic unit used varies widely in practice and the trade-offs are not well quantified. This letter validates BISG on Georgia's voter file, compares geocoded and nongeocoded approaches, and introduces ZIP codes as an intermediate geography for BISG.

📊 What Was Compared

Comparison: Geocoded Versus Nongeocoded BISG on a State Voter File

  • Data: Georgia voter file used as the validation dataset.
  • Methods: BISG applied under multiple geography levels and procedures: surname-only estimation, county-level approximations, nongeocoded ZIP-code-based estimation, and geocoded census-block-level BISG.
  • Aim: Quantify accuracy trade-offs across geography levels and assess missingness and bias implications of each approach.

🔍 Key Findings

  • ZIP-code BISG (without precise geocoding) is an acceptable alternative for estimating White and Black racial identification.
  • Census-block geocoded BISG yields the most accurate imputations for Asian and Hispanic voters, outperforming ZIP-code and larger-area approaches for these groups.
  • The choice of geography involves trade-offs between accuracy, data availability, and missingness; smaller geographies reduce bias for some groups but are more likely to be missing or unavailable.
  • Results identify a sequence of BISG practices that maximize correct racial identification while minimizing data missingness and bias across groups.

⚖️ Why It Matters

  • Practical guidance is provided for researchers and practitioners who must impute race from surnames and geography: when geocoding is unavailable, ZIP-code-level BISG can suffice for many analyses focused on White and Black populations, but analyses centered on Asian or Hispanic populations should prioritize census-block geocoding where possible.
  • The findings clarify the efficiency and limitations of common BISG implementations and offer a data-driven basis for selecting geography levels in race-imputation tasks.
data
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Podcast host Ryan