Small Choices, Big Differences: How Corpus and Coding Change News Tone Measures

Content Analysis Selection Bias Text Analysiscorpus selectioncoding designMethodology @Pol. An.17 R files 17 datasets Dataverse

🔎 What This Guide Does

This guide walks through the consequential decisions required before producing automated measures from news text, combining theoretical discussion with empirical tests. A running example—measuring the tone of New York Times coverage of the economy—illustrates how everyday choices reshape the data and the inferences researchers draw from it.

🧾 Running Example: Measuring NYT Economic Coverage

Uses New York Times articles about the economy as the empirical case to demonstrate practical implications.
Examines how different corpus construction and coding choices affect measures of tone.

🧭 How Choices Were Tested and Compared

Both theoretical arguments and empirical comparisons are used to assess impacts of methodological decisions.
Key dimensions evaluated include corpus selection, unit of analysis for coding, allocation of coding effort, and classification method (supervised algorithms versus dictionaries).

📌 Key Findings

Two reasonable approaches to corpus selection can produce radically different corpora, changing downstream measures and conclusions.
Keyword searches are recommended over predefined subject categories provided by news archives, because archive categories can yield inconsistent or misleading corpora.
Coding article segments (larger text chunks) provides clear benefits compared to sentence-level coding.
Given a fixed total number of codings, it is better to code more unique documents than to assign more coders per document.
Supervised machine learning classifiers outperform dictionary-based approaches on multiple criteria.

💡 Practical Recommendations and Takeaway

Prioritize careful corpus construction (favor keyword-based retrieval), choose segment-level units for coding when appropriate, allocate coding resources to increase document coverage, and favor supervised learning with human validation.
Thoughtfulness and human validation remain essential; automated classification is easy to run but can mislead when methodological choices are unattended.

Article card for article: Automated Text Classification of News Articles: A Practical Guide

Automated Text Classification of News Articles: A Practical Guide was authored by Pablo Barberá, Amber E. Boydstun, Suzanna Linn, Ryan McMahon and Jonathan Nagler. It was published by Cambridge in Pol. An. in 2021.