. Social Science Research Network [Internet]. 2016.
In general, the growth of big data sources have changed the threat landscape of privacy and statistics in at least three major ways. First, when surveys were initially founded as the principal source of statistical information, whether one participated in a survey was largely unknown. Now, as government record systems and corporate big data sources are increasingly used that include all or a large portion of a given universe, that privacy protection is eroded. Second, in the past, little outside information was generally available to match with published summaries. Now the ubiquity of auxiliary information enables many more inferences from summary data. Third, in the past, typical privacy attacks relied on linking outside data through well-known public characteristics -- PII or BII. Now, datasets can be linked through behavioral fingerprints. The current state of the practice in privacy lags well behind the state of the art in this area. Most commercial organizations, and most NSOs in other countries continue to rely (at most) on traditional aggregation and suppression methods to protect privacy – with no formal analysis of privacy loss or of the utility of the information gathered. The U.S. Census Bureau, because of its size, institutional capacity, and strong reputation for privacy protection could establish leadership in modernizing privacy practices.
The United States Census, in collaboration with the Program on Informatics at the Massachusetts Institute of Technology recently convened a series of workshops to examine computational, social-scientific, statistical, and informatic challenges to building the next generation of official statistics. Each of the three workshops focused on a different set of issues relates to big data -- potential sources; data privacy and security; and barriers to statistical inference.
The next generation of official statistics will utilize broad sources of information, potentially linked together, to provide increasing granularity, detail, and timeliness, while reducing cost and burden. The increasing availability of big data requires NSOs to make a number of adjustments.
These organizations will need to adapt their strategies, and broaden the current focus on data collection, to include broader issues of information provisioning and decision support; to modernize the approach they take in both assessing privacy, error, and utility across their entire range of data releases; to include business sources as stakeholders; and to expand from the current centralized model where all data is brought into the agency, and then linked and analyzed internally, to a model that supports computations over distributed and independently held collections of data.
. SSRN: Social Science Research Network [Internet]. 2015.
Broad new sources of information have the potential to bring increased granularity, detail and timeliness to the next generation of official statistics -- while reducing survey burden.The next generation of official statistics will utilize broad sources of information, potentially linked together, to provide increasing granularity, detail, and timeliness, while reducing cost and burden.
Utilizing big data requires creating new relationships with businesses. Many of the primary sources of big data are businesses that use data intensively to guide decisions: Big data sources are also critical stakeholders. In order to obtain access to data from businesses that create and use it, it is critical to both provide a value proposition to the business and to develop a trust relationship.