This article considers a modular approach to the design of integrated social surveys. The approach consists of grouping variables into ‘modules’, each of which is then allocated to one or more ‘instruments’. Each instrument is then administered to a random sample of population units, and each sample unit responds to all modules of the instrument. This approach offers a way of designing a system of integrated social surveys that balances the need to limit the cost and the need to obtain sufficient information. The allocation of the modules to instruments draws on the methodology of split questionnaire designs. The composition of the instruments, that is, how the modules are allocated to instruments, and the corresponding sample sizes are obtained as a solution to an optimisation problem. This optimisation involves minimisation of respondent burden and data collection cost, while respecting certain design constraints usually encountered in practice. These constraints may include, for example, the level of precision required and dependencies between the variables. We propose using a random search algorithm to find approximate optimal solutions to this problem. The algorithm is proved to fulfil conditions that ensure convergence to the global optimum and can also produce an efficient design for a split questionnaire.
Self-reports in surveys are often influenced by the presented question format and question context. Much less is known about how these effects influence the answers of younger survey respondents. The present study investigated how variations in response format, answer scale frequency, and question order influence self-reports of two age groups: younger (11–13 years old) and older (16–18 years old) adolescents. In addition, the impact of the respondents’ level of familiarity with the question content was taken into account. Results indicated that younger adolescents are more strongly influenced by the presented question format and context than older adolescents. This, however, was dependent on the particular question content, implying that response effects are more pronounced when questions deal with issues that lie outside of the respondents’ field of experience. Implications of these findings in survey research with younger respondents are discussed.
In the UK, the transparency agenda is forcing data stewardship organisations to review their dissemination policies and to consider whether to release data that is currently only available to a restricted community of researchers under licence as open data. Here we describe the results of a study providing evidence about the risks of such an approach via a simulated attack on two social survey datasets. This is also the first systematic attempt to simulate a jigsaw identification attack (one using a mashup of multiple data sources) on an anonymised dataset. The information that we draw on is collected from multiple online data sources and purchasable commercial data. The results indicate that such an attack against anonymised end user licence (EUL) datasets, if converted into open datasets, is possible and therefore we would recommend that penetration tests should be factored into any decision to make datasets (that are about people) open.
There is evidence that survey interviewers may be tempted to manipulate answers to filter questions in a way that minimizes the number of follow-up questions. This becomes relevant when ego-centered network data are collected. The reported network size has a huge impact on interview duration if multiple questions on each alter are triggered. We analyze interviewer effects on a network-size question in the mixed-mode survey “Panel Study ‘Labour Market and Social Security’” (PASS), where interviewers could skip up to 15 follow-up questions by generating small networks. Applying multilevel models, we find almost no interviewer effects in CATI mode, where interviewers are paid by the hour and frequently supervised. In CAPI, however, where interviewers are paid by case and no close supervision is possible, we find strong interviewer effects on network size. As the area-specific network size is known from telephone mode, where allocation to interviewers is random, interviewer and area effects can be separated. Furthermore, a difference-in-difference analysis reveals the negative effect of introducing the follow-up questions in Wave 3 on CAPI network size. Attempting to explain interviewer effects we neither find significant main effects of experience within a wave, nor significantly different slopes between interviewers.
This article describes the estimation of quality-adjusted price indexes from ‘big data’ such as scanner and online data when there is no available information on product characteristics for explicit quality adjustment using hedonic regression. The longitudinal information can be exploited to implicitly quality-adjust the price indexes. The fixed-effects (or ‘time-product dummy’) index is shown to be equivalent to a fully interacted time-dummy hedonic index based on all price-determining characteristics of the products, despite those characteristics not being observed. In production, this can be combined with a modified approach to splicing that incorporates the price movement across the full estimation window to reflect new products with one period’s lag without requiring revision. Empirical results for this fixed-effects window-splice (FEWS) index are presented for different data sources: three years of New Zealand consumer electronics scanner data from market-research company GfK; six years of United States supermarket scanner data from market-research company IRI; and 15 months of New Zealand consumer electronics daily online data from MIT’s Billion Prices Project.
Measuring working time is not only an important objective of the EU Labour Force Survey (LFS), but also a highly demanding task in terms of methodology. Against the background of a recent debate on the comparability of working time estimates in France and Germany, this article presents a comparative assessment of the measurement of working time in the Labour Force Survey obtained in both countries. It focuses on the measurement of the hours actually worked, the key working-time concept for short-term economic analysis and the National Accounts. The contribution systematically analyses the differences in the measurement approaches used in France and Germany in order to identify the methodological effects that hinder comparability. It comes to the conclusion that the LFS overstates the difference in hours actually worked in France and Germany and identifies question comprehension, rounding, editing effects, as well as certain aspects of the sampling design, as crucial factors of a reliable measurement in particular of absences from work during the reference week. We recommend continuing the work started in the European Statistical System towards the development of a model questionnaire in order to improve cross-national harmonisation of key variables such as hours actually worked.
Respondent-driven sampling (RDS) is often used to estimate population properties (e.g., sexual risk behavior) in hard-to-reach populations. In RDS, already sampled individuals recruit population members to the sample from their social contacts in an efficient snowball-like sampling procedure. By assuming a Markov model for the recruitment of individuals, asymptotically unbiased estimates of population characteristics can be obtained. Current RDS estimation methodology assumes that the social network is undirected, that is, all edges are reciprocal. However, empirical social networks in general also include a substantial number of nonreciprocal edges. In this article, we develop an estimation method for RDS in populations connected by social networks that include reciprocal and nonreciprocal edges. We derive estimators of the selection probabilities of individuals as a function of the number of outgoing edges of sampled individuals. The proposed estimators are evaluated on artificial and empirical networks and are shown to generally perform better than existing estimators. This is the case in particular when the fraction of directed edges in the network is large.
Measurement error can be very difficult to assess and reduce. While great strides have been made in the field of survey methods research in recent years, many ongoing federal surveys were initiated decades ago, before testing methods were fully developed. However, the longer a survey is in use, the more established the time series becomes, and any change to a questionnaire risks a break in that time series. This article documents how a major federal survey – the health insurance module of the Current Population Survey (CPS) – was redesigned over the course of 15 years through a systematic series of small, iterative tests, both qualitative and quantitative. This overview summarizes those tests and results, and illustrates how particular questionnaire design features were identified as problematic, and how improvements were developed and evaluated. While the particular topic is health insurance, the general approach (a coordinated series of small tests), along with the specific tests and methods employed, are not uniquely applicable to health insurance. Furthermore, the particular questionnaire design features of the CPS health module that were found to be most problematic are used in many other major surveys on a range of topic areas.
Misspecification effects (meffs) measure the effect on the sampling variance of an estimator of incorrect specification of both the sampling scheme and the model considered. We assess the effect of various features of complex sampling schemes on the inferences drawn from models for panel data using meffs. Many longitudinal social survey designs employ multistage sampling, leading to some clustering, which tends to lead to meffs greater than unity. An empirical study using data from the British Household Panel Survey is conducted, and a simulation study is performed. Our results suggest that clustering impacts are stronger for longitudinal studies than for cross-sectional studies, and that meffs for the regression coefficients increase with the number of waves analysed. Hence, estimated standard errors in the analysis of panel data can be misleading if any clustering is ignored.
When analyzing data sampled with unequal inclusion probabilities, correlations between the probability of selection and the sampled data can induce bias if the inclusion probabilities are ignored in the analysis. Weights equal to the inverse of the probability of inclusion are commonly used to correct possible bias. When weights are uncorrelated with the descriptive or model estimators of interest, highly disproportional sample designs resulting in large weights can introduce unnecessary variability, leading to an overall larger mean square error compared to unweighted methods.
We describe an approach we term ‘weight smoothing’ that models the interactions between the weights and the estimators as random effects, reducing the root mean square error (RMSE) by shrinking interactions toward zero when such shrinkage is allowed by the data. This article adapts a flexible Laplace prior distribution for the hierarchical Bayesian model to gain a more robust bias-variance tradeoff than previous approaches using normal priors. Simulation and application suggest that under a linear model setting, weight-smoothing models with Laplace priors yield robust results when weighting is necessary, and provide considerable reduction in RMSE otherwise. In logistic regression models, estimates using weight-smoothing models with Laplace priors are robust, but with less gain in efficiency than in linear regression settings.
This article considers a modular approach to the design of integrated social surveys. The approach consists of grouping variables into ‘modules’, each of which is then allocated to one or more ‘instruments’. Each instrument is then administered to a random sample of population units, and each sample unit responds to all modules of the instrument. This approach offers a way of designing a system of integrated social surveys that balances the need to limit the cost and the need to obtain sufficient information. The allocation of the modules to instruments draws on the methodology of split questionnaire designs. The composition of the instruments, that is, how the modules are allocated to instruments, and the corresponding sample sizes are obtained as a solution to an optimisation problem. This optimisation involves minimisation of respondent burden and data collection cost, while respecting certain design constraints usually encountered in practice. These constraints may include, for example, the level of precision required and dependencies between the variables. We propose using a random search algorithm to find approximate optimal solutions to this problem. The algorithm is proved to fulfil conditions that ensure convergence to the global optimum and can also produce an efficient design for a split questionnaire.
Self-reports in surveys are often influenced by the presented question format and question context. Much less is known about how these effects influence the answers of younger survey respondents. The present study investigated how variations in response format, answer scale frequency, and question order influence self-reports of two age groups: younger (11–13 years old) and older (16–18 years old) adolescents. In addition, the impact of the respondents’ level of familiarity with the question content was taken into account. Results indicated that younger adolescents are more strongly influenced by the presented question format and context than older adolescents. This, however, was dependent on the particular question content, implying that response effects are more pronounced when questions deal with issues that lie outside of the respondents’ field of experience. Implications of these findings in survey research with younger respondents are discussed.
In the UK, the transparency agenda is forcing data stewardship organisations to review their dissemination policies and to consider whether to release data that is currently only available to a restricted community of researchers under licence as open data. Here we describe the results of a study providing evidence about the risks of such an approach via a simulated attack on two social survey datasets. This is also the first systematic attempt to simulate a jigsaw identification attack (one using a mashup of multiple data sources) on an anonymised dataset. The information that we draw on is collected from multiple online data sources and purchasable commercial data. The results indicate that such an attack against anonymised end user licence (EUL) datasets, if converted into open datasets, is possible and therefore we would recommend that penetration tests should be factored into any decision to make datasets (that are about people) open.
There is evidence that survey interviewers may be tempted to manipulate answers to filter questions in a way that minimizes the number of follow-up questions. This becomes relevant when ego-centered network data are collected. The reported network size has a huge impact on interview duration if multiple questions on each alter are triggered. We analyze interviewer effects on a network-size question in the mixed-mode survey “Panel Study ‘Labour Market and Social Security’” (PASS), where interviewers could skip up to 15 follow-up questions by generating small networks. Applying multilevel models, we find almost no interviewer effects in CATI mode, where interviewers are paid by the hour and frequently supervised. In CAPI, however, where interviewers are paid by case and no close supervision is possible, we find strong interviewer effects on network size. As the area-specific network size is known from telephone mode, where allocation to interviewers is random, interviewer and area effects can be separated. Furthermore, a difference-in-difference analysis reveals the negative effect of introducing the follow-up questions in Wave 3 on CAPI network size. Attempting to explain interviewer effects we neither find significant main effects of experience within a wave, nor significantly different slopes between interviewers.
This article describes the estimation of quality-adjusted price indexes from ‘big data’ such as scanner and online data when there is no available information on product characteristics for explicit quality adjustment using hedonic regression. The longitudinal information can be exploited to implicitly quality-adjust the price indexes. The fixed-effects (or ‘time-product dummy’) index is shown to be equivalent to a fully interacted time-dummy hedonic index based on all price-determining characteristics of the products, despite those characteristics not being observed. In production, this can be combined with a modified approach to splicing that incorporates the price movement across the full estimation window to reflect new products with one period’s lag without requiring revision. Empirical results for this fixed-effects window-splice (FEWS) index are presented for different data sources: three years of New Zealand consumer electronics scanner data from market-research company GfK; six years of United States supermarket scanner data from market-research company IRI; and 15 months of New Zealand consumer electronics daily online data from MIT’s Billion Prices Project.
Measuring working time is not only an important objective of the EU Labour Force Survey (LFS), but also a highly demanding task in terms of methodology. Against the background of a recent debate on the comparability of working time estimates in France and Germany, this article presents a comparative assessment of the measurement of working time in the Labour Force Survey obtained in both countries. It focuses on the measurement of the hours actually worked, the key working-time concept for short-term economic analysis and the National Accounts. The contribution systematically analyses the differences in the measurement approaches used in France and Germany in order to identify the methodological effects that hinder comparability. It comes to the conclusion that the LFS overstates the difference in hours actually worked in France and Germany and identifies question comprehension, rounding, editing effects, as well as certain aspects of the sampling design, as crucial factors of a reliable measurement in particular of absences from work during the reference week. We recommend continuing the work started in the European Statistical System towards the development of a model questionnaire in order to improve cross-national harmonisation of key variables such as hours actually worked.
Respondent-driven sampling (RDS) is often used to estimate population properties (e.g., sexual risk behavior) in hard-to-reach populations. In RDS, already sampled individuals recruit population members to the sample from their social contacts in an efficient snowball-like sampling procedure. By assuming a Markov model for the recruitment of individuals, asymptotically unbiased estimates of population characteristics can be obtained. Current RDS estimation methodology assumes that the social network is undirected, that is, all edges are reciprocal. However, empirical social networks in general also include a substantial number of nonreciprocal edges. In this article, we develop an estimation method for RDS in populations connected by social networks that include reciprocal and nonreciprocal edges. We derive estimators of the selection probabilities of individuals as a function of the number of outgoing edges of sampled individuals. The proposed estimators are evaluated on artificial and empirical networks and are shown to generally perform better than existing estimators. This is the case in particular when the fraction of directed edges in the network is large.
Measurement error can be very difficult to assess and reduce. While great strides have been made in the field of survey methods research in recent years, many ongoing federal surveys were initiated decades ago, before testing methods were fully developed. However, the longer a survey is in use, the more established the time series becomes, and any change to a questionnaire risks a break in that time series. This article documents how a major federal survey – the health insurance module of the Current Population Survey (CPS) – was redesigned over the course of 15 years through a systematic series of small, iterative tests, both qualitative and quantitative. This overview summarizes those tests and results, and illustrates how particular questionnaire design features were identified as problematic, and how improvements were developed and evaluated. While the particular topic is health insurance, the general approach (a coordinated series of small tests), along with the specific tests and methods employed, are not uniquely applicable to health insurance. Furthermore, the particular questionnaire design features of the CPS health module that were found to be most problematic are used in many other major surveys on a range of topic areas.
Misspecification effects (meffs) measure the effect on the sampling variance of an estimator of incorrect specification of both the sampling scheme and the model considered. We assess the effect of various features of complex sampling schemes on the inferences drawn from models for panel data using meffs. Many longitudinal social survey designs employ multistage sampling, leading to some clustering, which tends to lead to meffs greater than unity. An empirical study using data from the British Household Panel Survey is conducted, and a simulation study is performed. Our results suggest that clustering impacts are stronger for longitudinal studies than for cross-sectional studies, and that meffs for the regression coefficients increase with the number of waves analysed. Hence, estimated standard errors in the analysis of panel data can be misleading if any clustering is ignored.
When analyzing data sampled with unequal inclusion probabilities, correlations between the probability of selection and the sampled data can induce bias if the inclusion probabilities are ignored in the analysis. Weights equal to the inverse of the probability of inclusion are commonly used to correct possible bias. When weights are uncorrelated with the descriptive or model estimators of interest, highly disproportional sample designs resulting in large weights can introduce unnecessary variability, leading to an overall larger mean square error compared to unweighted methods.
We describe an approach we term ‘weight smoothing’ that models the interactions between the weights and the estimators as random effects, reducing the root mean square error (RMSE) by shrinking interactions toward zero when such shrinkage is allowed by the data. This article adapts a flexible Laplace prior distribution for the hierarchical Bayesian model to gain a more robust bias-variance tradeoff than previous approaches using normal priors. Simulation and application suggest that under a linear model setting, weight-smoothing models with Laplace priors yield robust results when weighting is necessary, and provide considerable reduction in RMSE otherwise. In logistic regression models, estimates using weight-smoothing models with Laplace priors are robust, but with less gain in efficiency than in linear regression settings.