Identifying significant government policies has always been a challenge. This paper introduces an innovative method using positive unlabeled learning, where experts highlight just a few key outputs, then algorithms find others like them in large datasets.
Instead of costly human evaluations for each policy item, we offer an automated approach that starts with "seed" sets scraped from web data.
We demonstrate our technique on over 9,000 U.K. government regulations—a massive dataset—and validate the results by comparing against expert opinions and forecasting future citations accurately.