rss_2.0Journal of Official Statistics FeedSciendo RSS Feed for Journal of Official Statistics of Official Statistics 's Cover Sampling for the Population Coverage Survey of the New Italian Register Based Census<abstract> <title style='display:none'>Abstract</title> <p>For the first time in 2018 the Italian Institute of Statistics (Istat) implemented the annual Permanent Population Census which relies on the Population Base Register (PBR) and the Population Coverage Survey (PCS). This article provides a general overview of the PCS sampling design, which makes use of the PBR to correct population counts with the extended dual system estimator (Nirel and Glickman 2009). The sample allocation, proven optimal under a set of precision constraints, is based on preliminary estimates of individual probabilities of over-coverage and under-coverage. It defines the expected sample size in terms of individuals, and it oversamples the sub-populations subject to the risk of under/over coverage. Finally, the article introduces a sample selection method, which to the greatest extent possible satisfies the planned allocation of persons in terms of socio-demographic characteristics. Under acceptable assumptions, the article also shows that the sampling strategy enhances the precision of the estimates.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Fay-Herriot Model-Based Prediction Alternatives for Estimating Households with Emigrated Members<abstract> <title style='display:none'>Abstract</title> <p>This article proposes a new methodology for estimating the proportions of households that had experience of international migration at the municipal level in Colombia. The Colombian National Statistical Office usually produces estimations of internal migration based on the results of population censuses, but there is a lack of disaggregated information about the main small areas of origin of the population that emigrates from Colombia. The proposed methodology uses frequentist and Bayesian approaches based on a Fay-Herriot model and is illustrated by one example with a dependent variable from the Demographic and Health Survey 2015 and covariables available from the population census 2005. The proposed alternative produces proportion estimates that are consistent with sample sizes and the main internal immigration trends in Colombia. Additionally, the estimated coefficients of variation are lower than 20% for municipalities for both frequentist and Bayesian approaches and large demographically-relevant capital cities and therefore estimates may be considered to be reliable. Finally, we illustrate how the proposed alternative leads to important reductions of the estimated coefficients of variations for the areas with very small sample sizes.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00A General Framework for Multiple-Recapture Estimation that Incorporates Linkage Error Correction<abstract> <title style='display:none'>Abstract</title> <p>The size of a partly observed population is often estimated with the capture-recapture model. An important assumption of this chat model is that sources can be perfectly linked. This assumption is of relevance if the identification of records is not obtained by some perfect identifier (such as an id code) but by indirect identifiers (such as name and address). In that case, the perfect linkage assumption is often violated, which in general leads to biased population size estimates. Initial suggestions to solve this use record linkage probabilities to correct the capture-recapture model. In this article we provide a general framework, based on the standard log-linear modelling approach, that generalises this work towards the inclusion of additional sources and covariates. We show that the method performs well in a simulation study.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Spatio-Temporal Patterns in Portuguese Regional Fertility Rates: A Bayesian Approach for Spatial Clustering of Curves<abstract> <title style='display:none'>Abstract</title> <p>It is important for demographic analyses and policy-making to obtain accurate models of spatial diffusion, so that policy experiments can reflect endogenous spatial spillovers appropriately. Likewise, it is important to obtain accurate estimates and forecasts of demographic variables such as age-specific fertility rates, by regions and over time, as well as the uncertainty associated with such estimation. Here, we consider Bayesian hierarchical models with separable spatio-temporal dependence structure that can be estimated by borrowing strength from neighbouring regions and all years. Further, we do not consider the adjacency structure as a given, but rather as an object of inference. For this purpose, we use the local similarity of temporal patterns by developing a spatial clustering model based on Bayesian nonparametric smoothing techniques. The Bayesian inference provides the uncertainty associated with the clustering configurations that is typically lacking in classical analyses of large data sets in which a unique clustering representation can be insufficient. The proposed model is applied to 16-year data on age-specific fertility rates observed over 28 regions in Portugal, and provides statistical inference on the number of clusters, and local scaling and shrinkage levels. The corresponding central clustering configuration is able to capture spatial diffusion that has key demographic interpretations. Importantly, the exercise aids identification of peripheral regions with poor demographic prospects and development of regional policy for such places.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Modelling Frontier Mortality Using Bayesian Generalised Additive Models<abstract> <title style='display:none'>Abstract</title> <p>Mortality rates differ across countries and years, and the country with the lowest observed mortality has changed over time. However, the classic Science paper by Oeppen and Vaupel (2002) identified a persistent linear trend over time in maximum national life expectancy. In this article, we look to exploit similar regularities in age-specific mortality by considering for any given year a hypothetical mortality ‘frontier’, which we define as the lower limit of the force of mortality at each age across all countries. Change in this frontier reflects incremental advances across the wide range of social, institutional and scientific dimensions that influence mortality. We jointly estimate frontier mortality as well as mortality rates for individual countries. Generalised additive models are used to estimate a smooth set of baseline frontier mortality rates and mortality improvements, and country-level mortality is modelled as a set of smooth, positive deviations from this, forcing the mortality estimates for individual countries to lie above the frontier. This model is fitted to data for a selection of countries from the Human Mortality Database (2019). The efficacy of the model in forecasting over a ten-year horizon is compared to a similar model fitted to each country separately.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Fertility Projections in a European Context: A Survey of Current Practices among Statistical Agencies<abstract> <title style='display:none'>Abstract</title> <p>Projection studies have often focused on mortality and, more recently, migration. Fertility is less studied, although even small changes can have significant repercussions for the size and age structure of future populations. Across Europe, there is no consensus on how fertility is best projected. In this article, we identify different approaches used to project fertility among statistical agencies in Europe and provide an assessment of the different approaches according to the producers themselves.</p> <p>Data were collected using a mixed-method approach. First, European statistical agencies answered a questionnaire regarding fertility projection practices. Second, an in-depth review of select countries was performed.</p> <p>Most agencies combine formal models with expert opinion. While many attempt to maximise the use of relevant inputs, there is more variation in the detail of outputs, with some agencies unable to account for changing age patterns. In a context of limited resources, most are satisfied with their approaches, though some are assessing alternative methodologies to improve accuracy and increase transparency.</p> <p>This study highlights the diversity of approaches used in fertility projections across Europe. Such knowledge may be useful to statistical agencies as they consider, test and implement different approaches, perhaps in collaboration with other agencies and the wider scientific community.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Probabilistic Projection of Subnational Life Expectancy<abstract> <title style='display:none'>Abstract</title> <p>Projecting mortality for subnational units, or regions, is of great interest to practicing demographers. We seek a probabilistic method for projecting subnational life expectancy that is based on the national Bayesian hierarchical model used by the United Nations, and at the same time is easy to use. We propose three methods of this kind. Two of them are variants of simple scaling methods. The third method models life expectancy for a region as equal to national life expectancy plus a region-specific stochastic process which is a heteroskedastic first-order autoregressive process (AR(1)), with a variance that declines to a constant as life expectancy increases. We apply our models to data from 29 countries. In an out-of-sample comparison, the proposed methods outperformed other comparative methods and were well calibrated for individual regions. The AR (1) method performed best in terms of crossover patterns between regions. Although the methods work well for individual regions, there are some limitations when evaluating within-country variation. We identified four countries for which the AR(1) method either underestimated or overestimated the predictive between-region within-country standard deviation. However, none of the competing methods work better in this regard than the AR(1) method. In addition to providing the full distribution of subnational life expectancy, the methods can be used to obtain probabilistic forecasts of age-specific mortality rates.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Preface Assessment of the Census of Pakistan Using Demographic Analysis<abstract> <title style='display:none'>Abstract</title> <p>In 2017, Pakistan implemented a long-awaited population census since the last one conducted in 1998. However, several experts are contesting the validity of the census data at the sub-national level, particularly in the absence of a post-enumeration survey. We propose in this article to use demographic analysis to assess the results of the 2017 census at the sub-national level, using data from the 1998 census, from all available intercensal surveys, including three rounds of Demographic and Health Survey. Applying the cohort-component method of population projection, we subject each five first-level subnational entities to estimates regarding the level of fertility, mortality, international, and internal migration derived from the analysis of the existing data. We arrive at approximately similar results as the census at the national level: an estimated 210 million (95% CI: 203.4–218.9) compared to 207.8 million counted (1.1% difference). However, we found substantial sub-national variations. While there are too many uncertainties in the data used for the reconstruction to be fully confident about them, this analysis should prompt the national and the international community to ensure that a post-enumeration survey and demographic analysis are regular features of census operations of Pakistan in particular, and in developing countries with deficient data as a whole.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Latent Class Analysis for Estimating an Unknown Population Size – with Application to Censuses<abstract> <title style='display:none'>Abstract</title> <p>Estimation of the unknown population size using capture-recapture techniques relies on the key assumption that the capture probabilities are homogeneous across individuals in the population. This is usually accomplished via post-stratification by some key covariates believed to influence individual catchability. Another issue that arises in population estimation from data collected from multiple sources is list dependence, where an individual’s catchability on one list is related to that of another list. The earlier models for population estimation heavily relied upon list independence. However, there are methods available that can adjust the population estimates to account for dependence among lists. In this article, we propose the use of latent class analysis through log-linear modelling to estimate the population size in the presence of both heterogeneity and list dependence. The proposed approach is illustrated using data from the 1988 US census dress rehearsal.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00A Simulation Study of Diagnostics for Selection Bias<abstract> <title style='display:none'>Abstract</title> <p>A non-probability sampling mechanism arising from nonresponse or non-selection is likely to bias estimates of parameters with respect to a target population of interest. This bias poses a unique challenge when selection is ‘non-ignorable’, that is, dependent on the unobserved outcome of interest, since it is then undetectable and thus cannot be ameliorated. We extend a simulation study by Nishimura et al. (2016) adding two recently published statistics: the ‘standardized measure of unadjusted bias’ (SMUB) and ‘standardized measure of adjusted bias’ (SMAB), which explicitly quantify the extent of bias (in the case of SMUB) or nonignorable bias (in the case of SMAB) under the assumption that a specified amount of nonignorable selection exists. Our findings suggest that this new sensitivity diagnostic is more correlated with, and more predictive of, the true, unknown extent of selection bias than other diagnostics, even when the underlying assumed level of non-ignorability is incorrect.</p> </abstract>ARTICLE2021-09-13T00:00:00.000+00:00Letter to the Editors Hybrid Technique for the Multiple Imputation of Survey Data<abstract><title style='display:none'>Abstract</title><p>Most of the background variables in MICS (Multiple Indicator Cluster Surveys) are categorical with many categories. Like many other survey data, the MICS 2014 women’s data suffers from a large number of missing values. Additionally, complex dependencies may be existent among a large number of categorical variables in such surveys. The most commonly used parametric multiple imputation (MI) approaches based on log linear models or chained Equations (MICE) become problematic in these situations and often the implemented algorithms fail. On the other hand, nonparametric MI techniques based on Bayesian latent class models worked very well if only categorical variables are considered. This article describes how chained equations MI for continuous variables can be made dependent on categorical variables which have been imputed beforehand by using latent class models. Root mean square errors (RMSEs) and coverage rates of 95% confidence intervals (CI) for generalized linear models (GLM’s) with binary response are estimated in a simulation study and a comparison is made among proposed and various existing MI methods. The proposed method outperforms the MICE algorithms in most of the cases with less computational time. The results obtained by the simulation study are supported by a real data example.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00A structural Equation Model for Measuring Relative Development of Hungarian Counties in the Years 1994–2016<abstract><title style='display:none'>Abstract</title><p>Relative development of Hungarian counties is described generally by the GDP per capita indicator, but this figure does not cover the knowledge gap on the liveability of the regions. The other frequently used method is the indicator systems, but it does not emphasize the structure of causes and consequences of the regional development, and so, it does not provide information on which factors are more likely to be the causes or, reversely, the consequences of the different regional development. To overcome the shortcomings of the above-mentioned methods, we created a structural equation model (SEM) at NUTS 3 level for years 1994–2016 based on the LISREL estimation procedure. The applied model can be classified into experimental statistics, but it uses data only from official statistics, namely the regional indicators published by the Hungarian Central Statistical Office. The model assumes that the economic development depends on observable economic indicators, and it determines the regional development as well. In addition, the regional development is also explained by non-economic, social, demographic and cultural and infrastructural indicators. The variable selection and the classification into causes and consequences was a three-step process, and the factors were classified by analysis of correlations, cross-correlations and Granger-causality. The results of estimation provided basis for a deeper analysis; how the regional development has changed in Hungary after the regime change, and how these variables were influenced by the country’s integration into the global value chain.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00Improving Time Use Measurement with Personal Big Data Collection – The Experience of the European Big Data Hackathon 2019<abstract><title style='display:none'>Abstract</title><p>This article assesses the experience with i-Log at the European Big Data Hackathon 2019, a satellite event of the New Techniques and Technologies for Statistics (NTTS) conference, organised by Eurostat. i-Log is a system that enables capturing personal big data from smartphones’ internal sensors to be used for time use measurement. It allows the collection of heterogeneous types of data, enabling new possibilities for sociological urban field studies. Sensor data such as those related to the location or the movements of the user can be used to investigate and gain insights into the time diaries’ answers and assess their overall quality. The key idea is that the users’ answers are used to train machine-learning algorithms, allowing the system to learn from the user’s habits and to generate new time diaries’ answers. In turn, these new labels can be used to assess the quality of existing ones, or to fill the gaps when the user does not provide an answer. The aim of this paper is to introduce the pilot study, the i-Log system and the methodological evidence that emerged during the survey.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00A Diagnostic for Seasonality Based Upon Polynomial Roots of ARMA Models<abstract><title style='display:none'>Abstract</title><p>Methodology for seasonality diagnostics is extremely important for statistical agencies, because such tools are necessary for making decisions whether to seasonally adjust a given series, and whether such an adjustment is adequate. This methodology must be statistical, in order to furnish quantification of Type I and II errors, and also to provide understanding about the requisite assumptions. We connect the concept of seasonality to a mathematical definition regarding the oscillatory character of the moving average (MA) representation coefficients, and define a new seasonality diagnostic based on autoregressive (AR) roots. The diagnostic is able to assess different forms of seasonality: dynamic versus stable, of arbitrary seasonal periods, for both raw data and seasonally adjusted data. An extension of the AR diagnostic to an MA diagnostic allows for the detection of over-adjustment. Joint asymptotic results are provided for the diagnostics as they are applied to multiple seasonal frequencies, allowing for a global test of seasonality. We illustrate the method through simulation studies and several empirical examples.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00Variance Estimation after Mass Imputation Based on Combined Administrative and Survey Data<abstract><title style='display:none'>Abstract</title><p>This article discusses methods for evaluating the variance of estimated frequency tables based on mass imputation. We consider a general set-up in which data may be available from both administrative sources and a sample survey. Mass imputation involves predicting the missing values of a target variable for the entire population. The motivating application for this article is the Dutch virtual population census, for which it has been proposed to use mass imputation to estimate tables involving educational attainment. We present a new analytical design-based variance estimator for a frequency table based on mass imputation. We also discuss a more general bootstrap method that can be used to estimate this variance. Both approaches are compared in a simulation study on artificial data and in an application to real data of the Dutch census of 2011.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00Measuring and Communicating the Uncertainty in Official Economic Statistics<abstract><title style='display:none'>Abstract</title><p>Official economic statistics are uncertain even if not always interpreted or treated as such. From a historical perspective, this article reviews different categorisations of data uncertainty, specifically the traditional typology that distinguishes sampling from nonsampling errors and a newer typology of Manski (2015). Throughout, the importance of measuring and communicating these uncertainties is emphasised, as hard as it can prove to measure some sources of data uncertainty, especially those relevant to administrative and big data sets. Accordingly, this article both seeks to encourage further work into the measurement and communication of data uncertainty in general and to introduce the Comunikos (COMmunicating UNcertainty In Key Official Statistics) project at Eurostat. Comunikos is designed to evaluate alternative ways of measuring and communicating data uncertainty specifically in contexts relevant to official economic statistics.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00A Product Match Adjusted R Squared Method for Defining Products with Transaction Data<abstract><title style='display:none'>Abstract</title><p>The occurrence of relaunches of consumer goods at the barcode (GTIN) level is a well-known phenomenon in transaction data of consumer purchases. GTINs of disappearing and reintroduced items have to be linked in order to capture possible price changes.</p><p>This article presents a method that groups GTINs into strata (‘products’) by balancing two measures: an explained variance (R squared) measure for the ‘homogeneity’ of GTINs within products, while the second expresses the degree to which products can be ‘matched’ over time with respect to a comparison period. The resulting product ‘match adjusted R squared’ (MARS) combines explained variance in product prices with product match over time, so that different stratification schemes can be ranked according to the combined measure.</p><p>MARS has been applied to a broad range of product types. Individual GTINs are suitable as products for food and beverages, but not for product types with higher rates of churn, such as clothing, pharmacy products and electronics. In these cases, products are defined as combinations of characteristics, so that GTINs with the same characteristics are grouped into the same product. Future research focuses on further developments of MARS, such as attribute selection when data sets contain large numbers of variables.</p></abstract>ARTICLE2021-06-22T00:00:00.000+00:00en-us-1