
What the Authors Ask
Sarah Dreier, Sofia Serrano, Emily Gade, and Noah A. Smith test whether recent advances in natural language processing (NLP) improve researchers' ability to detect politically meaningful concepts in messy historical archives—specifically, government rationalizations for internment without trial. The question matters for scholars who want scalable ways to mine political texts for evidence of rights abuses and official justifications.
The Data and The Challenge
The team works with imperfectly digitized archival texts that contain spelling errors, OCR noise, and context-specific language. The target concept—official rationalizations for internment without trial—is subtle and often embedded in bureaucratic prose, making automated detection challenging.
How They Tested Models
Key Findings
Practical Guidance for Researchers
The article demonstrates when applying contextual NLP is beneficial and when human-in-the-loop workflows remain necessary. It also provides concrete advice on model choice, specification, and annotation strategies for political scientists working with context-specific policy discussions or poorly digitized historical records.
Why This Matters
This work clarifies both the promise and limits of state-of-the-art NLP for classifying politically salient concepts—helping researchers decide when machine labeling can responsibly replace or augment manual coding in studies of repression, civil liberties, and archival political texts.

| Troubles in Text: Using Natural Language Processing to recognize government rationalizations for rights abuses was authored by Sarah K. Dreier, Sofia Serrano, Emily K. Gade and Noah A. Smith. It was published by Chicago in JOP in 2025. |