FIND DATA: By Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | IR | Law & Courts🎵
   FIND DATA: By Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts🎵
WHAT'S NEW? US Politics | IR | Law & Courts🎵
If this link is broken, please report as broken. You can also submit updates (will be reviewed).

How a 'Binary Snowball' Method Boosts Precision in News Topic Coding

text classificationComparative AgendasMachine Learningbinary classificationHungaryMethodology@Pol. An.Dataverse
Methodology subfield banner

📌 What was attempted:

Presents a machine-learning solution aimed at matching the gold standard of double-blind human coding for content analysis in comparative politics. The goal was to classify front-page articles of a leading Hungarian daily by full text into one of 21 policy topics from the Comparative Agendas Project codebook.

🗞️ What was analyzed (data and target):

  • Front-page articles from a leading Hungarian daily newspaper
  • Full-text documents assigned to 21 policy topics using the Comparative Agendas Project codebook

🔍 How the hybrid binary snowball approach worked:

  • Combined supervised machine learning with limited human coding effort
  • Converted the multiclass (21-way) problem into a series of binary classification tasks
  • Used a snowball procedure that augmented the training set with machine-classified observations after each successful round and also between corpora
  • Designed specifically to handle strongly imbalanced topic classes while minimizing human labor

🧾 Key results:

  • Precision exceeded 80% for most topic codes
  • Precision performance was higher than what is customary for human coders and for most computer-assisted coding projects
  • High precision came with limited coverage: fewer than 60% of articles were labeled by the system

⚖️ Why this matters:

  • Demonstrates a practical workflow that trades broader coverage for higher precision when human resources are constrained
  • Offers a scalable option for high-precision topic labeling in comparative politics, with explicit trade-offs between label quality and the share of articles labeled
Article card for article: The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach
The Multiclass Classification of Newspaper Articles with Machine Learning: The Hybrid Binary Snowball Approach was authored by Miklós Sebők and Zoltán Kacsuk. It was published by Cambridge in Pol. An. in 2021.
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Edit article record marker