🔎 The problem addressed
Ecological inference—inferring turnout and vote choice for racial groups from aggregate election returns and neighborhood racial composition—can generate aggregation bias that distorts race-specific turnout and vote-share estimates. A different strategy is to predict individual-level ethnicity from voter registration records and then aggregate those predictions.
đź§ľ How ethnicity was predicted
- Bayes's rule was used to combine the Census Bureau's Surname List with information drawn from geocoded voter registration records.
- The approach produces probabilistic ethnicity assignments for individual registrants by merging surname-based priors with location-based evidence.
📊 How the method was evaluated
- Validation used approximately nine million Florida voter registration records where self-reported ethnicity is available.
- Predicted ethnicities were compared to self-reports to measure true and false positive rates, and predictions were used to estimate turnout by race for comparison with standard ecological inference estimates.
âś… Key findings
- False positive rates were reduced to 6% for Black voters and 3% for Latino voters.
- True positive rates remained above 80% across groups.
- Turnout-by-race estimates derived from the predictions exhibited substantially lower bias and root mean squared error than standard ecological inference estimates.
🛠️ Practical output
Open-source software is provided to implement the proposed methodology, enabling replication and application to other voter files.
đź’ˇ Why it matters
Predicting individual ethnicity from voter registration records offers a practical, data-driven alternative to traditional ecological inference, reducing aggregation bias in race-specific turnout estimates with clear applications for political behavior research and voting-rights litigation.






