FIND DATA: By Journal | Sites   ANALYZE DATA: Help with R | SPSS | Stata | Excel   WHAT'S NEW? US Politics | IR | Law & Courts🎵
   FIND DATA: By Journal | Sites   WHAT'S NEW? US Politics | IR | Law & Courts🎵
WHAT'S NEW? US Politics | IR | Law & Courts🎵
If this link is broken, please report as broken. You can also submit updates (will be reviewed).

Split Samples Improve Power and Replication for Causal Studies, With Limits

Split-SamplePreanalysis-PlanReplicationStatistical PowerCausal InferenceMethodologyPol. An.3 Stata filesDataverse

🔧 How the split-sample procedure works

Researchers send their dataset to an independent third party that randomly creates a training sample and a withheld testing sample. All model building, hypothesis selection, and revisions occur using the training sample, allowing feedback from colleagues, editors, and referees. Once the paper is accepted, the pre-specified analysis is applied to the testing sample, and those testing-sample results are the ones published.

📊 What the simulations show

  • Under empirically relevant settings, the split-sample method yields greater statistical power than a conventional preanalysis plan (PAP).
  • The primary mechanism for this power gain is a reduced chance that relevant hypotheses are never tested during research workflows.
  • The advantage is strongest in settings where outcomes of interest are uncertain and exploration is common.

⚖️ When this approach is most and least appropriate

  • Well-suited for exploratory analyses with substantial uncertainty about outcomes and hypotheses.
  • Not recommended when treatments are very costly and available sample size is severely limited, because withholding a testing sample can make inference underpowered.

🔍 How to interpret the method

  • The procedure can be seen as enabling direct replication: the testing sample functions as an independent confirmation of results developed on the training sample.

🛠️ Practical considerations for implementation

  • Requires an independent third party to perform the random split and hold the testing data until acceptance.
  • Allows iterative improvement and feedback on analyses without compromising the credibility of final published estimates.
  • Feasibility issues and implementation logistics (data transfer, pre-specification of analysis on the training set, and journal workflows) are discussed in detail.

Why it matters

This split-sample protocol offers a pragmatic middle ground between exploratory work and strict preanalysis plans: it preserves opportunities for refinement and feedback while producing published results that come from an independent test, improving credibility and—under many realistic conditions—statistical power.

Article Card
Using Split Samples to Improve Inference on Causal Effects was authored by Marcel Fachamps and Julien Labonne. It was published by Cambridge in Pol. An. in 2017.
Find on Google Scholar
Find on JSTOR
Find on CUP
Political Analysis
data