An Evolutionary Schema for Using “it-is-what-it-is” Data in Official Statistics

The linking of disparate data sets across time, space and sources is probably the foremost current issue facing Central Statistical Agencies (CSA). If one reviews the current literature looking for the prevalent challenges facing CSAs, three issues stand out: 1) using administrative data effectively; 2) big data and what it means for CSAs; and 3) integrating disparate data set (such as health, education and wealth) to provide measurable facts that can guide policy makers. CSAs are being challenged to explore the same kind of challenges faced by Google, Facebook, and Yahoo, which are using graphical/semantic web models for organizing, searching and analysing data. Additionally, time and space (geography) are becoming more important dimensions (domains) for CSAs as they start to explore new data sources and ways to integrate those to study relationships. Central agency methodologists are being pushed to include these new perspectives into their standard theories, practises and policies. Like most methodologists, the authors see surveys and the publications of their results as a process where estimation is the key tool to achieve the final goal of an accurate statistical output. Randomness and sampling exists to support this goal, and early on it was clear to us that the incoming “it-is-what-it-is” data sources were not randomly selected. These sources were obviously biased and thus would produce biased estimates. So, we set out to design a strategy to deal with this issue.

This article presents a schema for integrating and linking traditional and non-traditional datasets. Like all survey methodologies, this schema addresses the fundamental issues of representativeness, estimation and total survey error measurement.

eISSN:: 2001-7367
Langue:: Anglais

Périodicité:: 4 fois par an
Sujets de la revue:: Mathematics, Probability and Statistics

RSS Feed de la revue

An Evolutionary Schema for Using “it-is-what-it-is” Data in Official Statistics

Publié en ligne: 26 mars 2019

Pages: 137 - 165

Reçu: 01 août 2017

Accepté: 01 mai 2018

DOI: https://doi.org/10.2478/jos-2019-0007

Mots clésRepresentativeness, timeline databases, statistical registers, Estimation, administrative data

© 2019 Jack Lothian et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Mots clés
Representativeness, timeline databases, statistical registers, Estimation, administrative data