🔍 Problem and Motivation
Social scientists often need time-varying joint distributions—for example, to construct poststratification weights—but population data across time are typically sparse, irregular, and noisy. When different variables are observed on different schedules or only margins (not full joint distributions) are available, survey weights are frequently limited to the small subset of auxiliary variables with regularly observed joint data, leaving other useful information unused.
đź§° Model and Approach
A dynamic Bayesian ecological inference model is developed to estimate multivariate categorical distributions from sparse, irregular, and noisy marginal (or partially joint) data. The method combines three core components:
- A Dirichlet sampling model for the observed margins conditional on the unobserved cell proportions.
- A set of equations that encode the logical relationships among different population quantities.
- A Dirichlet transition model for period-specific proportions that pools information across time periods.
📊 Illustration and Implementation
The method is illustrated by estimating annual U.S. phone-ownership rates by race and region using population data irregularly available between 1930 and 1960. An R package, estsubpop, implements the method to facilitate applied use and replication.
đź’ˇ Why It Matters
- Enables dynamic ecological inferences about interior cells when only marginal or partially joint observations exist.
- Pools information across time to improve estimates of period-specific multivariate categorical distributions.
- Allows fuller use of auxiliary information for tasks such as survey poststratification where joint population distributions are intermittently observed.
This approach provides a flexible, principled way to reconstruct time-varying multivariate categorical distributions from incomplete marginal data, expanding the set of auxiliary variables usable in longitudinal survey weighting and other population-inference tasks.






