Machine Coded Data? Not Always Better Than Human Coding for Underreporting Bias

computational methods Machine Learning Measurement Error News Media Repression Methodology @PSR&M Dataverse

This research investigates a common problem in textual political science data: underreporting bias. News sources often fail to report state repression events, similar issues can occur with human coders.

Using the Agence France-Presse and Associated Press news datasets as examples, Cook et al.'s method estimates the extent of unreported repression by comparing multiple sources' coverage.

Researchers applied this technique using machine-coded data from the World-Integrated Crisis Early Warning System dataset. Both models (human vs. machine coding) were then evaluated against external measures of human rights protections in Africa and Colombia.

The findings reveal that underreporting bias affects both forms of data collection similarly across different contexts like Colombia's political landscape.

This means researchers must actively account for potential missing information whether analyzing news reports or algorithmically coded texts.

Article card for article: The Prevalence and Severity of Underreporting Bias in Machine and Human Coded Data

The Prevalence and Severity of Underreporting Bias in Machine and Human Coded Data was authored by Benjamin Bagozzi, Patrick Brandt, John Freeman, Jennifer Holmes, Alisha Kim, Agustin Palao Mendizabal and Carly Potz-Nielsen. It was published by Cambridge in PSR&M in 2019.