Issues

Journal & Issues

Volume 38 (2022): Issue 3 (September 2022)

Volume 38 (2022): Issue 2 (June 2022)

Volume 38 (2022): Issue 1 (March 2022)
Special Issue on Price Indices in Official Statistics

Volume 37 (2021): Issue 4 (December 2021)

Volume 37 (2021): Issue 3 (September 2021)
Special Issue on Population Statistics for the 21st Century

Volume 37 (2021): Issue 2 (June 2021)
Special Issue on New Techniques and Technologies for Statistics

Volume 37 (2021): Issue 1 (March 2021)

Volume 36 (2020): Issue 4 (December 2020)

Volume 36 (2020): Issue 3 (September 2020)
Special Issue on Nonresponse

Volume 36 (2020): Issue 2 (June 2020)

Volume 36 (2020): Issue 1 (March 2020)

Volume 35 (2019): Issue 4 (December 2019)
Special Issue on Measuring LGBT Populations

Volume 35 (2019): Issue 3 (September 2019)

Volume 35 (2019): Issue 2 (June 2019)

Volume 35 (2019): Issue 1 (March 2019)

Volume 34 (2018): Issue 4 (December 2018)

Volume 34 (2018): Issue 3 (September 2018)
Special Section on Responsive and Adaptive Survey Design

Volume 34 (2018): Issue 2 (June 2018)
Special Issue on Establishment Surveys (ICES-V)

Volume 34 (2018): Issue 1 (March 2018)

Volume 33 (2017): Issue 4 (December 2017)

Volume 33 (2017): Issue 3 (September 2017)
Special Issue on Responsive and Adaptive Survey Design

Volume 33 (2017): Issue 2 (June 2017)
Special Issue on Total Survey Error (TSE)

Volume 33 (2017): Issue 1 (March 2017)

Volume 32 (2016): Issue 4 (December 2016)
Special Section on The Role of official Statistics in Statistical Capacity Building

Volume 32 (2016): Issue 3 (September 2016)

Volume 32 (2016): Issue 2 (June 2016)

Volume 32 (2016): Issue 1 (March 2016)

Volume 31 (2015): Issue 4 (December 2015)

Volume 31 (2015): Issue 3 (September 2015)
Special Issue on Coverage Problems in Administrative Sources

Volume 31 (2015): Issue 2 (June 2015)
Special Issue on New Techniques and Technologies for Statistics

Volume 31 (2015): Issue 1 (March 2015)

Volume 30 (2014): Issue 4 (December 2014)
Special Issue on Establishment Surveys

Volume 30 (2014): Issue 3 (September 2014)

Volume 30 (2014): Issue 2 (June 2014)
Special Issue on Surveying the Hard-to-Reach

Volume 30 (2014): Issue 1 (March 2014)

Volume 29 (2013): Issue 4 (December 2013)

Volume 29 (2013): Issue 3 (September 2013)

Volume 29 (2013): Issue 2 (June 2013)

Volume 29 (2013): Issue 1 (March 2013)

Journal Details
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English

Search

Volume 37 (2021): Issue 1 (March 2021)

Journal Details
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English

Search

10 Articles
Open Access

Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data

Published Online: 13 Mar 2021
Page range: 1 - 30

Abstract

Abstract

This research outlines the process of building a sample frame of US SMEs. The method starts with a list of patenting organizations and defines the boundaries of the population and subsequent frame using free to low-cost data sources, including search engines and websites. Generating high-quality data is of key importance throughout the process of building the frame and subsequent data collection; at the same time, there is too much data to curate by hand. Consequently, we turn to machine learning and other computational methods to apply a number of data matching, filtering, and cleaning routines. The results show that it is possible to generate a sample frame of innovative SMEs with reasonable accuracy for use in subsequent research: Our method provides data for 79% of the frame. We discuss implications for future work for researchers and NSIs alike and contend that the challenges associated with big data collections require not only new skillsets but also a new mode of collaboration.

Keywords

  • Sample frame
  • administrative and big data
  • machine learning
  • bias
  • small and medium-sized enterprises
Open Access

Optimal Reconciliation of Seasonally Adjusted Disaggregates Taking Into Account the Difference Between Direct and Indirect Adjustment of the Aggregate

Published Online: 13 Mar 2021
Page range: 31 - 51

Abstract

Abstract

This article presents a new method to reconcile direct and indirect deseasonalized economic time series. The proposed technique uses a Combining Rule to merge, in an optimal manner, the directly deseasonalized aggregated series with its indirectly deseasonalized counterpart. The lastmentioned series is obtained by aggregating the seasonally adjusted disaggregates that compose the aggregated series. This procedure leads to adjusted disaggregates that verify Denton’s movement preservation principle relative to the originally deseasonalized disaggregates. First, we use as preliminary estimates the directly deseasonalized economic time series obtained with the X-13ARIMA-SEATS program applied to all the disaggregation levels. Second, we contemporaneously reconcile the aforementioned seasonally adjusted disaggregates with its seasonally adjusted aggregate, using Vector Autoregressive models. Then, we evaluate the finite sample performance of our solution via a Monte Carlo experiment that considers six Data Generating Processes that may occur in practice, when users apply seasonal adjustment techniques. Finally, we present an empirical application to the Mexican Global Economic Indicator and its components. The results allow us to conclude that the suggested technique is appropriate to indirectly deseasonalize economic time series, mainly because we impose the movement preservation condition to the preliminary estimates produced by a reliable seasonal adjustment procedure.

Keywords

  • Combining rule
  • contemporaneous restrictions
  • Monte Carlo experiment
  • vector autoregressive model
  • X-13ARIMA-SEATS
Open Access

Panel Conditioning in the U.S. Consumer Expenditure Survey

Published Online: 13 Mar 2021
Page range: 53 - 69

Abstract

Abstract

The U.S. Consumer Expenditure Interview Survey asks many filter questions to identify the items that households purchase. Each reported purchase triggers follow-up questions about the amount spent and other details. We test the hypothesis that respondents learn how the questionnaire is structured and underreport purchases in later waves to reduce the length of the interview. We analyze data from 10,416 four-wave respondents over two years of data collection. We find no evidence of decreasing data quality over time; instead, panel respondents tend to give higher quality responses in later waves. The results also hold for a larger set of two-wave respondents.

Keywords

  • Measurement error
  • panel conditioning
  • consumer expenditure
Open Access

Weighted Dirichlet Process Mixture Models to Accommodate Complex Sample Designs for Linear and Quantile Regression

Published Online: 13 Mar 2021
Page range: 71 - 95

Abstract

Abstract

Standard randomization-based inference conditions on the data in the population and makes inference with respect to the repeating sampling properties of the sampling indicators. In some settings these estimators can be quite unstable; Bayesian model-based approaches focus on the posterior predictive distribution of population quantities, potentially providing a better balance between bias correction and efficiency. Previous work in this area has focused on estimation of means and linear and generalized linear regression parameters; these methods do not allow for a general estimation of distributional functions such as quantile or quantile regression parameters. Here we adapt an extended Dirichlet Process Mixture model that allows the DP prior to be a mixture of DP random basis measures that are a function of covariates. These models allow many mixture components when necessary to accommodate the sample design, but can shrink to few components for more efficient estimation when the data allow. We provide an application to the estimation of relationships between serum dioxin levels and age in the US population, either at the mean level (via linear regression) or across the dioxin distribution (via quantile regression) using the National Health and Nutrition Examination Survey.

Keywords

  • Sampling weights
  • bayesian finite population inference
  • posterior predictive distribution
  • dioxin
  • NHANES
Open Access

Identifying Outliers in Response Quality Assessment by Using Multivariate Control Charts Based on Kernel Density Estimation

Published Online: 13 Mar 2021
Page range: 97 - 119

Abstract

Abstract

When monitoring industrial processes, a Statistical Process Control tool, such as a multivariate Hotelling T2 chart is frequently used to evaluate multiple quality characteristics. However, research into the use of T2 charts for survey fieldwork–essentially a production process in which data sets collected by means of interviews are produced–has been scant to date. In this study, using data from the eighth round of the European Social Survey in Belgium, we present a procedure for simultaneously monitoring six response quality indicators and identifying outliers: interviews with anomalous results. The procedure integrates Kernel Density Estimation (KDE) with a T2 chart, so that historical “in-control” data or reference to the assumption of a parametric distribution of the indicators is not required. In total, 75 outliers (4.25%) are iteratively removed, resulting in an in-control data set containing 1,691 interviews. The outliers are mainly characterized by having longer sequences of identical answers, a greater number of extreme answers, and against expectation, a lower item nonresponse rate. The procedure is validated by means of ten-fold cross-validation and comparison with the minimum covariance determinant algorithm as the criterion. By providing a method of obtaining in-control data, the present findings go some way toward a way to monitor response quality, identify problems, and provide rapid feedbacks during survey fieldwork.

Keywords

  • Kernel density estimation
  • Hotelling chart
  • multivariate control charts
  • response quality
  • ten-fold cross-validation
Open Access

Can Smart City Data be Used to Create New Official Statistics?

Published Online: 13 Mar 2021
Page range: 121 - 147

Abstract

Abstract

In this article we evaluate the viability of using big data produced by smart city systems for creating new official statistics. We assess sixteen sources of urban transportation and environmental big data that are published as open data or were made available to the project for Dublin, Ireland. These data were systematically explored through a process of data checking and wrangling, building tools to display and analyse the data, and evaluating them with respect to 16 measures of their suitability: access, sustainability and reliability, transparency and interpretability, privacy, fidelity, cleanliness, completeness, spatial granularity, temporal granularity, spatial coverage, coherence, metadata availability, changes over time, standardisation, methodological transparency, and relevance. We assessed how the data could be used to produce key performance indicators and potential new official statistics. Our analysis reveals that, at present, a limited set of smart city data is suitable for creating new official statistics, though others could potentially be made suitable with changes to data management. If these new official statistics are to be realised then National Statistical Institutions need to work closely with those organisations generating the data to try and implement a robust set of procedures and standards that will produce consistent, long-term data sets.

Keywords

  • Big data
  • transport
  • environment
  • data quality
  • key performance indicators
Open Access

An App-Assisted Travel Survey in Official Statistics: Possibilities and Challenges

Published Online: 13 Mar 2021
Page range: 149 - 170

Abstract

Abstract

Advances in smartphone technology have allowed for individuals to have access to near-continuous location tracking at a very precise level. As the backbone of mobility research, the Travel Diary Study, has continued to offer decreasing response rates over the years, researchers are looking to these mobile devices to bridge the gap between self-report recall studies and a person’s underlying travel behavior. This article details an open-source application that collects real-time location data which respondents may then annotate to provide a detailed travel diary. Results of the field test involving 674 participants are discussed, including technical performance, data quality and response rate.

Keywords

  • Non-response
  • travel diary
  • sensor data for surveys
  • app design
  • android background restriction
Open Access

Measuring and Modeling Food Losses

Published Online: 13 Mar 2021
Page range: 171 - 211

Abstract

Abstract

Within the context of Sustainable Development Goals, progress towards Target 12.3 can be measured and monitored with the Food Loss Index. A major challenge is the lack of data, which dictated many methodology decisions. Therefore, the objective of this work is to present a possible improvement to the modeling approach used by the Food and Agricultural Organization in estimating the annual percentage of food losses by country and commodity. Our proposal combines robust statistical techniques with the strict adherence to the rules of the official statistics. In particular, the case study focuses on cereal crops, which currently have the highest (yet incomplete) data coverage and allow for more ambitious modeling choices. Cereal data is available in 66 countries and 14 different cereal commodities from 1991 to 2014. We use the annual food loss as response variable, expressed as percentage over production, by country and cereal commodity. The estimation work is twofold: it aims at selecting the most important factors explaining losses worldwide, comparing two Bayesian model selection approaches, and then at predicting losses with a Beta regression model in a fully Bayesian framework.

Keywords

  • Bayesian variable selection
  • Beta mixed model
  • SDG 12.3
Open Access

Survey Mode Effects on Objective and Subjective Questions: Evidence from the Labour Force Survey

Published Online: 13 Mar 2021
Page range: 213 - 237

Abstract

Abstract

Web questionnaires are increasingly used to complement traditional data collection in mixed mode surveys. However, the utilization of web data raises concerns whether web questionnaires lead to mode-specific measurement bias. We argue that the magnitude of measurement bias strongly depends on the content of a variable. Based on the Luxembourgish Labour Force Survey, we investigate differences between web and telephone data in terms of objective (i.e., Employment Status) and subjective (i.e., Wage Adequacy and Job Satisfaction) variables. To assess whether differences in outcome variables are caused by sample composition or mode-specific measurement bias, we apply a coarsened exact matching that approximates randomized experiments by reducing dissimilarities between web and telephone samples. We select matching variables with a combination of automatic variable selection via random forest and a literature-driven selection. The results show that objective variables are not affected by mode-specific measurement bias, but web participants report lower satisfaction-levels on subjective variables than telephone participants. Extensive supplementary analyses confirm our results. The present study supports the view that the impact of survey mode depends on the content of a survey and its variables.

Keywords

  • Web survey
  • telephone survey
  • mode effects
  • coarsened exact matching
  • measurement bias
Open Access

Generalised Regression Estimation Given Imperfectly Matched Auxiliary Data

Published Online: 13 Mar 2021
Page range: 239 - 255

Abstract

Abstract

Generalised regression estimation allows one to make use of available auxiliary information in survey sampling. We develop three types of generalised regression estimator when the auxiliary data cannot be matched perfectly to the sample units, so that the standard estimator is inapplicable. The inference remains design-based. Consistency of the proposed estimators is either given by construction or else can be tested given the observed sample and links. Mean square errors can be estimated. A simulation study is used to explore the potentials of the proposed estimators.

Keywords

  • Record linkage
  • incidence weights
  • reverse incidence weights
10 Articles
Open Access

Building a Sample Frame of SMEs Using Patent, Search Engine, and Website Data

Published Online: 13 Mar 2021
Page range: 1 - 30

Abstract

Abstract

This research outlines the process of building a sample frame of US SMEs. The method starts with a list of patenting organizations and defines the boundaries of the population and subsequent frame using free to low-cost data sources, including search engines and websites. Generating high-quality data is of key importance throughout the process of building the frame and subsequent data collection; at the same time, there is too much data to curate by hand. Consequently, we turn to machine learning and other computational methods to apply a number of data matching, filtering, and cleaning routines. The results show that it is possible to generate a sample frame of innovative SMEs with reasonable accuracy for use in subsequent research: Our method provides data for 79% of the frame. We discuss implications for future work for researchers and NSIs alike and contend that the challenges associated with big data collections require not only new skillsets but also a new mode of collaboration.

Keywords

  • Sample frame
  • administrative and big data
  • machine learning
  • bias
  • small and medium-sized enterprises
Open Access

Optimal Reconciliation of Seasonally Adjusted Disaggregates Taking Into Account the Difference Between Direct and Indirect Adjustment of the Aggregate

Published Online: 13 Mar 2021
Page range: 31 - 51

Abstract

Abstract

This article presents a new method to reconcile direct and indirect deseasonalized economic time series. The proposed technique uses a Combining Rule to merge, in an optimal manner, the directly deseasonalized aggregated series with its indirectly deseasonalized counterpart. The lastmentioned series is obtained by aggregating the seasonally adjusted disaggregates that compose the aggregated series. This procedure leads to adjusted disaggregates that verify Denton’s movement preservation principle relative to the originally deseasonalized disaggregates. First, we use as preliminary estimates the directly deseasonalized economic time series obtained with the X-13ARIMA-SEATS program applied to all the disaggregation levels. Second, we contemporaneously reconcile the aforementioned seasonally adjusted disaggregates with its seasonally adjusted aggregate, using Vector Autoregressive models. Then, we evaluate the finite sample performance of our solution via a Monte Carlo experiment that considers six Data Generating Processes that may occur in practice, when users apply seasonal adjustment techniques. Finally, we present an empirical application to the Mexican Global Economic Indicator and its components. The results allow us to conclude that the suggested technique is appropriate to indirectly deseasonalize economic time series, mainly because we impose the movement preservation condition to the preliminary estimates produced by a reliable seasonal adjustment procedure.

Keywords

  • Combining rule
  • contemporaneous restrictions
  • Monte Carlo experiment
  • vector autoregressive model
  • X-13ARIMA-SEATS
Open Access

Panel Conditioning in the U.S. Consumer Expenditure Survey

Published Online: 13 Mar 2021
Page range: 53 - 69

Abstract

Abstract

The U.S. Consumer Expenditure Interview Survey asks many filter questions to identify the items that households purchase. Each reported purchase triggers follow-up questions about the amount spent and other details. We test the hypothesis that respondents learn how the questionnaire is structured and underreport purchases in later waves to reduce the length of the interview. We analyze data from 10,416 four-wave respondents over two years of data collection. We find no evidence of decreasing data quality over time; instead, panel respondents tend to give higher quality responses in later waves. The results also hold for a larger set of two-wave respondents.

Keywords

  • Measurement error
  • panel conditioning
  • consumer expenditure
Open Access

Weighted Dirichlet Process Mixture Models to Accommodate Complex Sample Designs for Linear and Quantile Regression

Published Online: 13 Mar 2021
Page range: 71 - 95

Abstract

Abstract

Standard randomization-based inference conditions on the data in the population and makes inference with respect to the repeating sampling properties of the sampling indicators. In some settings these estimators can be quite unstable; Bayesian model-based approaches focus on the posterior predictive distribution of population quantities, potentially providing a better balance between bias correction and efficiency. Previous work in this area has focused on estimation of means and linear and generalized linear regression parameters; these methods do not allow for a general estimation of distributional functions such as quantile or quantile regression parameters. Here we adapt an extended Dirichlet Process Mixture model that allows the DP prior to be a mixture of DP random basis measures that are a function of covariates. These models allow many mixture components when necessary to accommodate the sample design, but can shrink to few components for more efficient estimation when the data allow. We provide an application to the estimation of relationships between serum dioxin levels and age in the US population, either at the mean level (via linear regression) or across the dioxin distribution (via quantile regression) using the National Health and Nutrition Examination Survey.

Keywords

  • Sampling weights
  • bayesian finite population inference
  • posterior predictive distribution
  • dioxin
  • NHANES
Open Access

Identifying Outliers in Response Quality Assessment by Using Multivariate Control Charts Based on Kernel Density Estimation

Published Online: 13 Mar 2021
Page range: 97 - 119

Abstract

Abstract

When monitoring industrial processes, a Statistical Process Control tool, such as a multivariate Hotelling T2 chart is frequently used to evaluate multiple quality characteristics. However, research into the use of T2 charts for survey fieldwork–essentially a production process in which data sets collected by means of interviews are produced–has been scant to date. In this study, using data from the eighth round of the European Social Survey in Belgium, we present a procedure for simultaneously monitoring six response quality indicators and identifying outliers: interviews with anomalous results. The procedure integrates Kernel Density Estimation (KDE) with a T2 chart, so that historical “in-control” data or reference to the assumption of a parametric distribution of the indicators is not required. In total, 75 outliers (4.25%) are iteratively removed, resulting in an in-control data set containing 1,691 interviews. The outliers are mainly characterized by having longer sequences of identical answers, a greater number of extreme answers, and against expectation, a lower item nonresponse rate. The procedure is validated by means of ten-fold cross-validation and comparison with the minimum covariance determinant algorithm as the criterion. By providing a method of obtaining in-control data, the present findings go some way toward a way to monitor response quality, identify problems, and provide rapid feedbacks during survey fieldwork.

Keywords

  • Kernel density estimation
  • Hotelling chart
  • multivariate control charts
  • response quality
  • ten-fold cross-validation
Open Access

Can Smart City Data be Used to Create New Official Statistics?

Published Online: 13 Mar 2021
Page range: 121 - 147

Abstract

Abstract

In this article we evaluate the viability of using big data produced by smart city systems for creating new official statistics. We assess sixteen sources of urban transportation and environmental big data that are published as open data or were made available to the project for Dublin, Ireland. These data were systematically explored through a process of data checking and wrangling, building tools to display and analyse the data, and evaluating them with respect to 16 measures of their suitability: access, sustainability and reliability, transparency and interpretability, privacy, fidelity, cleanliness, completeness, spatial granularity, temporal granularity, spatial coverage, coherence, metadata availability, changes over time, standardisation, methodological transparency, and relevance. We assessed how the data could be used to produce key performance indicators and potential new official statistics. Our analysis reveals that, at present, a limited set of smart city data is suitable for creating new official statistics, though others could potentially be made suitable with changes to data management. If these new official statistics are to be realised then National Statistical Institutions need to work closely with those organisations generating the data to try and implement a robust set of procedures and standards that will produce consistent, long-term data sets.

Keywords

  • Big data
  • transport
  • environment
  • data quality
  • key performance indicators
Open Access

An App-Assisted Travel Survey in Official Statistics: Possibilities and Challenges

Published Online: 13 Mar 2021
Page range: 149 - 170

Abstract

Abstract

Advances in smartphone technology have allowed for individuals to have access to near-continuous location tracking at a very precise level. As the backbone of mobility research, the Travel Diary Study, has continued to offer decreasing response rates over the years, researchers are looking to these mobile devices to bridge the gap between self-report recall studies and a person’s underlying travel behavior. This article details an open-source application that collects real-time location data which respondents may then annotate to provide a detailed travel diary. Results of the field test involving 674 participants are discussed, including technical performance, data quality and response rate.

Keywords

  • Non-response
  • travel diary
  • sensor data for surveys
  • app design
  • android background restriction
Open Access

Measuring and Modeling Food Losses

Published Online: 13 Mar 2021
Page range: 171 - 211

Abstract

Abstract

Within the context of Sustainable Development Goals, progress towards Target 12.3 can be measured and monitored with the Food Loss Index. A major challenge is the lack of data, which dictated many methodology decisions. Therefore, the objective of this work is to present a possible improvement to the modeling approach used by the Food and Agricultural Organization in estimating the annual percentage of food losses by country and commodity. Our proposal combines robust statistical techniques with the strict adherence to the rules of the official statistics. In particular, the case study focuses on cereal crops, which currently have the highest (yet incomplete) data coverage and allow for more ambitious modeling choices. Cereal data is available in 66 countries and 14 different cereal commodities from 1991 to 2014. We use the annual food loss as response variable, expressed as percentage over production, by country and cereal commodity. The estimation work is twofold: it aims at selecting the most important factors explaining losses worldwide, comparing two Bayesian model selection approaches, and then at predicting losses with a Beta regression model in a fully Bayesian framework.

Keywords

  • Bayesian variable selection
  • Beta mixed model
  • SDG 12.3
Open Access

Survey Mode Effects on Objective and Subjective Questions: Evidence from the Labour Force Survey

Published Online: 13 Mar 2021
Page range: 213 - 237

Abstract

Abstract

Web questionnaires are increasingly used to complement traditional data collection in mixed mode surveys. However, the utilization of web data raises concerns whether web questionnaires lead to mode-specific measurement bias. We argue that the magnitude of measurement bias strongly depends on the content of a variable. Based on the Luxembourgish Labour Force Survey, we investigate differences between web and telephone data in terms of objective (i.e., Employment Status) and subjective (i.e., Wage Adequacy and Job Satisfaction) variables. To assess whether differences in outcome variables are caused by sample composition or mode-specific measurement bias, we apply a coarsened exact matching that approximates randomized experiments by reducing dissimilarities between web and telephone samples. We select matching variables with a combination of automatic variable selection via random forest and a literature-driven selection. The results show that objective variables are not affected by mode-specific measurement bias, but web participants report lower satisfaction-levels on subjective variables than telephone participants. Extensive supplementary analyses confirm our results. The present study supports the view that the impact of survey mode depends on the content of a survey and its variables.

Keywords

  • Web survey
  • telephone survey
  • mode effects
  • coarsened exact matching
  • measurement bias
Open Access

Generalised Regression Estimation Given Imperfectly Matched Auxiliary Data

Published Online: 13 Mar 2021
Page range: 239 - 255

Abstract

Abstract

Generalised regression estimation allows one to make use of available auxiliary information in survey sampling. We develop three types of generalised regression estimator when the auxiliary data cannot be matched perfectly to the sample units, so that the standard estimator is inapplicable. The inference remains design-based. Consistency of the proposed estimators is either given by construction or else can be tested given the observed sample and links. Mean square errors can be estimated. A simulation study is used to explore the potentials of the proposed estimators.

Keywords

  • Record linkage
  • incidence weights
  • reverse incidence weights

Plan your remote conference with Sciendo