FIND DATA: By Author | Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | Int'l Relations | Law & Courts
   FIND DATA: By Author | Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts
If this link is broken, please report as broken. You can also submit updates (will be reviewed).
Insights from the Field

De-Identification Fails: Differential Privacy Protects Survey Respondents—But Requires Bigger Samples


differential privacy
reidentification
surveys
sample size
Methodology
AJPS
13 R files
2 Datasets
1 Text
Dataverse
Differentially Private Survey Research was authored by Georgina Evans, Gary King, Adam D. Smith and Abhradeep Thakurta. It was published by Wiley in AJPS in 2025.

đź”’ Privacy Problem Exposed

De-identification—removing names and direct identifiers—has long been the standard way to share survey data. Recent work shows these procedures do not stop intentional re-identification attacks, creating a real risk for large survey programs in academia, government, and industry. This risk is especially acute in political science because respondents’ political beliefs are among the most sensitive information they provide.

🔎 How Re-identification Was Tested

A practical demonstration confirms the threat: individuals were re-identified from a de-identified survey about a controversial referendum declaring life beginning at conception. Key points about the demonstration:

  • The target dataset was a survey on a politically sensitive referendum.
  • Conventional de-identification (removing direct identifiers) was insufficient to prevent intentional re-identification.

🛡️ A Practical Fix Built on Differential Privacy

A set of new data-sharing procedures, grounded in the formal notion of differential privacy, is proposed to address the problem. These procedures provide:

  • Mathematical guarantees that individual respondents’ privacy is protected against a wide class of re-identification attacks.
  • Statistical-validity guarantees that allow social scientists to analyze the released, differentially private data while accounting for the privacy-induced noise.

⚖️ Trade-offs and Implications

The primary cost of deploying differential privacy for survey data is larger standard errors in estimates derived from the privatized data. However, this cost has a clear remedy: larger sample sizes reduce the privacy-induced loss of precision. Implications include:

  • A necessary shift in data-sharing practice from ad hoc de-identification to formally private release mechanisms.
  • A planning consideration for survey designers and funders to budget for larger samples when differential privacy is required.

đź’ˇ Why It Matters

Adopting differential privacy preserves respondent confidentiality with provable guarantees while keeping survey data usable for research. Without it, traditional de-identification leaves respondents vulnerable to re-identification—undermining trust in survey research and threatening the viability of studies that collect highly sensitive political information.

data
Find on Google Scholar
Find on JSTOR
Find on Wiley
American Journal of Political Science
Podcast host Ryan