Coding latent concepts in political science often relies on expert judgment, but scholars haven't thoroughly examined how these experts' traits impact reliability or the effects of such factors. This study tests a template using coder-level data for six variables from a cross-national panel dataset. It aggregates this data via an ordinal item response theory model that specifically estimates expert reliability.
By regressing these reliability estimates against both expert demographic characteristics and their coding behavior patterns, we find minimal evidence linking most traits—including gender—to consistent differences in reliability. While intuitive factors like contextual knowledge do improve performance slightly, the null findings suggest that expert demographics alone are not strong predictors of quality.
This reinforces item response theory models as a robust approach for aggregating expert-coded data across various political science contexts.