Read Political Science Articles with Replication Data

Insights from the Field

Humans Flail at Keyword Search; A Better Computer-Based Approach Emerges

Keyword Selection

Human Bias

Text Analysis

Classifier Errors

Computer-Assisted Keyword and Document Set Discovery from Unstructured Text was authored by Gary King, Patrick Lam and Margaret E. Roberts. It was published by Wiley in AJPS in 2017.

Keyword selection, often overlooked yet critical for text analysis research, remains poorly executed by humans due to inherent bias and suboptimal methods. This paper presents a novel computer-assisted framework that overcomes these limitations.

### The Challenge: Subpar Human Keyword Selection

Researchers frequently underestimate the complexity of choosing effective keywords from large unstructured text datasets. Standard approaches like Google searches fail to capture nuanced requirements for political science applications where precise document retrieval is essential.

### Our Solution: Leveraging Classifier Errors

We introduce a statistical method that trains classifiers on available text, then systematically analyzes their misclassifications—errors—to identify meaningful search terms. This approach extracts valuable information without needing structured data inputs or attempting to correct mistakes directly.

### How It Works

* Generates Boolean search strings easily understandable by researchers

* Provides suggestions for keywords based on statistical patterns in the unstructured text itself

* Creates 'document sets' optimized for discovery and retrieval tasks, rather than relying solely on pre-defined labels

### Applications Demonstrated

The technique proves valuable across various domains:

* Social media analysis where users rapidly innovate language to evade authorities (e.g., Chinese social media posts designed to circumvent censorship)

* General web searches requiring nuanced topic identification

* eDiscovery processes in legal contexts

* Industry and intelligence analyses needing comprehensive document coverage

### Results

Illustrative applications, such as an analysis of English-language texts about the Boston Marathon bombings, demonstrate how this method effectively captures relevant documents by identifying terms that other approaches miss. This computer-assisted approach delivers superior keyword suggestions compared to human intuition or standard automated methods.