How Machine Learning Reveals Fraud in Argentina's Infamous Decade

Bayesian Methods Machine Learningsynthetic datavote countsVoting and Elections @Pol. An.6 Stata files 4 Datasets Dataverse

🔍 What Was Tested

A prototype system for diagnosing electoral fraud using only vote counts was developed and evaluated.

🧠 How the Classifier Was Built

Synthetic data were generated to develop and train a fraud-detection prototype.
A naive Bayes classifier served as the learning algorithm.
Digital feature analysis identified which vote-count features are most informative about class distinctions.

📊 What Data Were Used to Evaluate It

Authentic district-level vote counts from a novel dataset covering the province of Buenos Aires (Argentina) between 1931 and 1941—a period with a checkered history of fraud.

✅ Key Findings

Elections that historians consider irregular are unambiguously classified as fraudulent by the classifier; elections considered legitimate are classified as clean.
These results corroborate the validity of the synthetic-data training approach.
More broadly, the findings demonstrate the feasibility of generating and using synthetic data to train and test an electoral-fraud detection system.

📌 Why It Matters

Provides a practical, reproducible method to detect fraud from vote counts alone.
Shows synthetic-data training can overcome the scarcity of labeled historical fraud cases and enable systematic testing of detection tools.

Article card for article: Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning

Fraudulent Democracy? An Analysis of Argentina's Infamous Decade Using Supervised Machine Learning was authored by Francisco Cantú and Sebastián M. Saiegh. It was published by Cambridge in Pol. An. in 2017.