FIND DATA: By Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | IR | Law & Courts🎵
   FIND DATA: By Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts🎵
WHAT'S NEW? US Politics | IR | Law & Courts🎵
If this link is broken, please report as broken. You can also submit updates (will be reviewed).

Small Choices, Big Differences: How Corpus and Coding Change News Tone Measures

text classificationsupervised learningcorpus selectioncoding designNew York TimesMethodology@Pol. An.17 R files17 datasetsDataverse
Methodology subfield banner

🔎 What This Guide Does

This guide walks through the consequential decisions required before producing automated measures from news text, combining theoretical discussion with empirical tests. A running example—measuring the tone of New York Times coverage of the economy—illustrates how everyday choices reshape the data and the inferences researchers draw from it.

🧾 Running Example: Measuring NYT Economic Coverage

  • Uses New York Times articles about the economy as the empirical case to demonstrate practical implications.
  • Examines how different corpus construction and coding choices affect measures of tone.

🧭 How Choices Were Tested and Compared

  • Both theoretical arguments and empirical comparisons are used to assess impacts of methodological decisions.
  • Key dimensions evaluated include corpus selection, unit of analysis for coding, allocation of coding effort, and classification method (supervised algorithms versus dictionaries).

📌 Key Findings

  • Two reasonable approaches to corpus selection can produce radically different corpora, changing downstream measures and conclusions.
  • Keyword searches are recommended over predefined subject categories provided by news archives, because archive categories can yield inconsistent or misleading corpora.
  • Coding article segments (larger text chunks) provides clear benefits compared to sentence-level coding.
  • Given a fixed total number of codings, it is better to code more unique documents than to assign more coders per document.
  • Supervised machine learning classifiers outperform dictionary-based approaches on multiple criteria.

💡 Practical Recommendations and Takeaway

  • Prioritize careful corpus construction (favor keyword-based retrieval), choose segment-level units for coding when appropriate, allocate coding resources to increase document coverage, and favor supervised learning with human validation.
  • Thoughtfulness and human validation remain essential; automated classification is easy to run but can mislead when methodological choices are unattended.
Article card for article: Automated Text Classification of News Articles: A Practical Guide
Automated Text Classification of News Articles: A Practical Guide was authored by Pablo Barberá, Amber E. Boydstun, Suzanna Linn, Ryan McMahon and Jonathan Nagler. It was published by Cambridge in Pol. An. in 2021.
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
Edit article record marker