1. bookVolume 31 (2015): Issue 2 (June 2015)
    Special Issue on New Techniques and Technologies for Statistics
Journal Details
License
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
access type Open Access

Big Data as a Source for Official Statistics

Published Online: 27 Jun 2015
Page range: 249 - 262
Received: 01 Aug 2013
Accepted: 01 Sep 2014
Journal Details
License
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
Abstract

More and more data are being produced by an increasing number of electronic devices physically surrounding us and on the internet. The large amount of data and the high frequency at which they are produced have resulted in the introduction of the term ‘Big Data’. Because these data reflect many different aspects of our daily lives and because of their abundance and availability, Big Data sources are very interesting from an official statistics point of view. This article discusses the exploration of both opportunities and challenges for official statistics associated with the application of Big Data. Experiences gained with analyses of large amounts of Dutch traffic loop detection records and Dutch social media messages are described to illustrate the topics characteristic of the statistical analysis and use of Big Data.

Keywords

ASA. 2014. Discovery With Data: Leveraging Statistics with Computer Science to Transform Science and Society. July 2, 2014 version. Available at: http://www.amstat.org/policy/pdfs/BigDataStatisticsJune2014.pdf (accessed July 2014).Search in Google Scholar

Beyer, M.A. and L. Douglas. 2012. The Importance of ‘Big Data’: A Definition. Gartner report, June version, ID Number: G00235055. Available at: http://www.gartner.com/it-glossary/big-data/ (accessed January 2013).Search in Google Scholar

Breiman, L. 2001. “Statistical Modeling: The Two Cultures.” Statistical Science 16: 99-231. Doi: http://dx.doi.org/10.1214/ss/1009213726.10.1214/ss/1009213726Search in Google Scholar

Buelens, B., H.J. Boonstra, J. van den Brakel, and P. Daas. 2012. Shifting Paradigms in Official Statistics: from Design-Based to Model-Based to Algorithmic Inference. Discussion paper 201218, Statistics Netherlands, The Hague/Heerlen.Search in Google Scholar

Buelens, B., P. Daas, J. Burger, M. Puts, and J. van den Brakel. 2014. Selectivity of Big Data. Discussion paper 201411, Statistics Netherlands, The Hague/Heerlen, The Netherlands.Search in Google Scholar

Cheung, P. 2012. Big Data, Official Statistics and Social Science Research: Emerging Data Challenges. Presentation at the December 19th World Bank meeting, Washington.Available at: http://www.worldbank.org/wb/Big-data-pc-2012-12-12.pdf (accessed January 2013).Search in Google Scholar

Coosto. 2013. Main page. Available at: http://www.coosto.com/uk/ (accessed August 2013).Search in Google Scholar

Daas, P.J.H. and M.J.H. Puts. 2014. Social Media Sentiment and Consumer Confidence.Paper for the Workshop on using Big Data for Forecasting and Statistics, April 7-8, Frankfurt, Germany. Available at: https://www.ecb.europa.eu/pub/pdf/scpsps/ecbsp5.pdf (accessed April 2015).Search in Google Scholar

Daas, P.J.H., M. Roos, M. van de Ven, and J. Neroni. 2012a. Twitter as a Potential Data Source for Statistics. Discussion paper 201221, The Hague/Heerlen: Statistics Netherlands.Search in Google Scholar

Daas, P., M. Tennekes, E. de Jonge, A. Priem, B. Buelens, M. van Pelt, and P. van den Hurk. 2012b. Data Science and the Future of Statistics. Presentation at the first Data Science NL meetup, Utrecht University, Utrecht. Available at: http://www.slideshare.net/pietdaas/data-science-and-the-future-of-statistics (accessed December 2012).Search in Google Scholar

De Jonge, E., M. van Pelt, and M. Roos. 2012. Time Patterns, Geospatial Clustering and Mobility Statistics Based on Mobile Phone Network Data. Discussion paper 201214, The Hague/Heerlen: Statistics Netherlands.Search in Google Scholar

De Jonge, E., J. Wijffels, and J. van der Laan. 2014. “ffbase: Basic Statistical Functions for Package ff. R package version 0.11.3.” Available at: http://cran.r-project.org/web/packages/ffbase/index.html (accessed April 2015).Search in Google Scholar

De Waal, T., J. Pannekoek, and S. Scholtus. 2011. Handbook of Statistical Editing and Imputation. Hoboken, NJ: John Wiley & Sons.10.1002/9780470904848Search in Google Scholar

Engle, R.F. and C.W.J. Granger. 1987. “Co-Integration and Error Correction: Representation, Estimation, and Testing.” Econometrica 55: 251-276.10.2307/1913236Search in Google Scholar

Eurostat. 2012. Internet Access and Use. Eurostat newsrelease 185/2012, December 18, 2012. Available at: http://epp.eurostat.ec.europa.eu/cache/ITY_PUBLIC/4-18122012-AP/EN/4-18122012-AP-EN.PDF (accessed January 2013).Search in Google Scholar

Flekova, L. and I. Gurevych. 2013. Can We Hide in the Web? Large Scale Simultaneous Age and Gender Author Profiling in Social Media. Paper for the evaluation lab on uncovering plagiarism, authorship, and social software misuse at Conference and Labs Evaluation Forum 2013, September 23-26, Valencia, Spain.Search in Google Scholar

Fry, B. 2008. Visualizing Data: Exploring and Explaining Data with the Processing Environment. Sebastopol, CA: O’Reilly Media Inc.Search in Google Scholar

Glasson, M., J. Trepanier, V. Patruno, P. Daas, M. Skaliotis, and A. Khan. 2013. What does “Big Data” mean for Official Statistics? Paper for the High-Level Group for the Modernization of Statistical Production and Services, March 10.Search in Google Scholar

Golder, S.A. and M.W. Macy. 2011. “Diurnal and Seasonal Mood Vary with Work, Sleep, and Daylength Across Diverse Cultures.” Science 30: 1878-1881. Doi: http://dx.doi.org/10.1126/science.1202775. 10.1126/science.1202775Search in Google Scholar

Groves, R.M. 2011. “Three Eras of Survey Research.” Public Opinion Quarterly 75: 861-871. Doi: http://dx.doi.org/10.1093/poq/nfr057.10.1093/poq/nfr057Search in Google Scholar

Hassani, H., G. Saporta, and E. Sirimal Silvia. 2014. “Data Mining and Official Statistics: The Past, the Present and the Future.” Big Data 2: 1-10. Doi: http://dx.doi.org/10.1089/big.2013.0038.10.1089/big.2013.0038Search in Google Scholar

Hastie, T., R. Tibshirani, and J. Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer Science þ Business Media, LLC.Search in Google Scholar

Lansdall-Welfare, T., V. Lampos, and N. Cristianini. 2012. “Nowcasting the Mood of the Nation.” Significance 9: 26-28. Available at: http://www.significancemagazine.org/details/magazine/2468761/Nowcasting-the-mood-of-the-nation.html (accessed January 2013).10.1111/j.1740-9713.2012.00588.xSearch in Google Scholar

Lynch, C. 2008. “Big Data: How Do Your Data Grow?” Nature 455: 28-29. Doi: http:// dx.doi.org/10.1038/455028a.10.1038/455028aSearch in Google Scholar

Manton, J.H., V. Krishnamurthy, and R.J. Elliott. 1999. “Discrete Time Filters for Double Stochastic Poisson Processes and Other Exponential Noise Models.” International Journal of Adaptive Control and Signal Processing 13: 393-416.10.1002/(SICI)1099-1115(199908)13:5<393::AID-ACS561>3.0.CO;2-JSearch in Google Scholar

Manyika, J., M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh, and A. Hung Byers. 2011. Big Data: The Next Frontier for Innovation, Competition, and Productivity. Report of the McKinsey Global Institute, McKinsey & Company.Search in Google Scholar

NAS. 2013. Frontiers in Massive Data Analysis. Washington, DC: The National Academies Press.Search in Google Scholar

NDW. 2012. The Database Explained. Brochure of the National Data Warehouse for Traffic Information, March. Available at: http://www.ndw.nu/download_files.php?action¼download_file&file_hash¼209140a807e959f06646b0311f79de26 (accessed December 2012).Search in Google Scholar

O’Connor, B., R. Balasubramanyan, B.R. Routledge, and N.A. Smith. 2010. From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series. Carnegie Mellon University, Research Showcase. Available at: www.cs.cmu.edu/,nasmith/papers/oconnorþbalasubramanyanþroutledgeþsmith.icwsm10.pdf (accessed April 2015).Search in Google Scholar

R Development Core Team. 2012. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna.Search in Google Scholar

Rajaraman, A. and J.D. Ullman. 2011. Mining of Massive Datasets. Cambridge: Cambridge University Press.Search in Google Scholar

Schutt, R. and C. O’Neil. 2013. Doing Data Science: Straight Talk from the Frontline.Search in Google Scholar

Sebastopol, CA: O’Reilly Media. Scott, S.L., A.W. Blocker, F.V. Bonassi, H.A. Chipman, E.I. George, and R.E. McCulloch. 2013. Bayes and Big Data: The Consensus Monte Carlo Algorithm. Bayes 250. Available at: http://www.rob-mcculloch.org/some_papers_and_talks/papers/working/consensus-mc.pdf (accessed April 2015).Search in Google Scholar

Statistics Netherlands. 2013. Consumer Confidence Survey. Available at: http://www.cbs.nl/en-GB/menu/methoden/dataverzameling/consumenten-conjunctuuronderzoek-cco.htm (accessed April 2013).Search in Google Scholar

Struijs, P. and P.J.H. Daas. 2013. Big Data, Big Impact? Paper for the Seminar on Statistical Data Collection, September 25-27, Geneva. Switzerland Search in Google Scholar

Tennekes, M., E. de Jonge, and P.J.H. Daas. 2013. “Visualizing and Inspecting Large Datasets with Tableplots.” Journal of Data Science 11: 43-58.10.6339/JDS.201301_11(1).0003Search in Google Scholar

Van der Laan, J. 2013. LaF: Fast Access to Large ASCII files. R package version 0.5.Search in Google Scholar

Zikopoulos, P., D. deRoos, K. Parasuraman, T. Deutsch, D. Corrigan, and J. Giles. 2012. Harness the Power of Big Data. New York: McGraw-Hill. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo