MIDAS: Deep Learning That Fixes Missing Data Fast

Machine Learningdenoising autoencoderMethodology @Pol. An.12 R files 1255 datasets Dataverse

🔍 What This Paper Introduces

Multiple imputation is a widely used, principled approach for handling missing values but often breaks down on very large or complex datasets. MIDAS (Multiple Imputation with Denoising Autoencoders) offers an accurate, fast, and scalable alternative by adapting a class of unsupervised neural networks—denoising autoencoders—to the imputation task.

🧠 How MIDAS Works

MIDAS repurposes denoising autoencoders by treating missing entries as an extra type of corruption. The model is trained to reconstruct the originally observed data while the missing entries are treated like corrupted inputs. Imputations are then drawn from the trained model that minimizes reconstruction error on the observed portion of the data.

📋 Key Features and Procedure

Reformulates multiple imputation using denoising autoencoders.
Treats missing values as corrupted data during training and draws multiple imputations from the reconstruction distribution.
Optimizes a loss that focuses on reconstructing the originally observed values, ensuring imputations align with observed structure.

📈 Tests on Simulated and Real Social Science Data

Systematic evaluations include both simulations and empirical social science datasets. An applied example uses a large-scale electoral survey to demonstrate performance in a real-world setting.

Findings show MIDAS delivers strong accuracy across a range of missingness patterns and data complexities.
MIDAS demonstrates computational efficiency and scalability relative to common multiple imputation approaches.

⚙️ Practical Takeaways and Tools

MIDAS provides a practical route to multiply impute large, high-dimensional datasets that challenge traditional methods.
Open-source software is provided to implement MIDAS in applied settings, enabling replication and adoption.

🔎 Why It Matters

MIDAS bridges principled multiple imputation and modern deep learning, offering political scientists and social researchers a scalable tool to handle missing data in large surveys and complex datasets without sacrificing accuracy.

Article card for article: The MIDAS Touch: Accurate and Scalable Missing-Data Imputation With Deep Learning

The MIDAS Touch: Accurate and Scalable Missing-Data Imputation With Deep Learning was authored by Ranjit Lall and Thomas Robinson. It was published by Cambridge in Pol. An. in 2022.