FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Insights from the Field

Humans Flail at Keyword Search; A Better Computer-Based Approach Emerges


Keyword Selection
Human Bias
Text Analysis
Classifier Errors
Methodology
AJPS
3 R files
2 text files
20 datasets
1 PDF files
2 other files
Dataverse
Computer-Assisted Keyword and Document Set Discovery from Unstructured Text was authored by Gary King, Patrick Lam and Margaret E. Roberts. It was published by Wiley in AJPS in 2017.

Keyword selection, often overlooked yet critical for text analysis research, remains poorly executed by humans due to inherent bias and suboptimal methods. This paper presents a novel computer-assisted framework that overcomes these limitations.

### The Challenge: Subpar Human Keyword Selection

Researchers frequently underestimate the complexity of choosing effective keywords from large unstructured text datasets. Standard approaches like Google searches fail to capture nuanced requirements for political science applications where precise document retrieval is essential.

### Our Solution: Leveraging Classifier Errors

We introduce a statistical method that trains classifiers on available text, then systematically analyzes their misclassifications—errors—to identify meaningful search terms. This approach extracts valuable information without needing structured data inputs or attempting to correct mistakes directly.

### How It Works

* Generates Boolean search strings easily understandable by researchers

* Provides suggestions for keywords based on statistical patterns in the unstructured text itself

* Creates 'document sets' optimized for discovery and retrieval tasks, rather than relying solely on pre-defined labels

### Applications Demonstrated

The technique proves valuable across various domains:

* Social media analysis where users rapidly innovate language to evade authorities (e.g., Chinese social media posts designed to circumvent censorship)

* General web searches requiring nuanced topic identification

* eDiscovery processes in legal contexts

* Industry and intelligence analyses needing comprehensive document coverage

### Results

Illustrative applications, such as an analysis of English-language texts about the Boston Marathon bombings, demonstrate how this method effectively captures relevant documents by identifying terms that other approaches miss. This computer-assisted approach delivers superior keyword suggestions compared to human intuition or standard automated methods.

data
Find on Google Scholar
Find on JSTOR
Find on Wiley
American Journal of Political Science
Podcast host Ryan