1. bookVolume 37 (2021): Issue 3 (September 2021)
    Special Issue on Population Statistics for the 21st Century
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year
access type Open Access

Latent Class Analysis for Estimating an Unknown Population Size – with Application to Censuses

Published Online: 13 Sep 2021
Page range: 673 - 697
Received: 01 Apr 2019
Accepted: 01 Jul 2020
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year

Estimation of the unknown population size using capture-recapture techniques relies on the key assumption that the capture probabilities are homogeneous across individuals in the population. This is usually accomplished via post-stratification by some key covariates believed to influence individual catchability. Another issue that arises in population estimation from data collected from multiple sources is list dependence, where an individual’s catchability on one list is related to that of another list. The earlier models for population estimation heavily relied upon list independence. However, there are methods available that can adjust the population estimates to account for dependence among lists. In this article, we propose the use of latent class analysis through log-linear modelling to estimate the population size in the presence of both heterogeneity and list dependence. The proposed approach is illustrated using data from the 1988 US census dress rehearsal.


Agresti, A. 1994. “Simple capture-recapture models permitting unequal catchability and variable sampling effort”. Biometrics 50 (2): 494–500. DOI: http://doi.org/10.2307/2533391. Search in Google Scholar

Alho, J. 1990. “Logistic regression in capture-recapture models”. Biometrics 46 (3): 623–635. DOI: http://doi.org/10.2307/2532083. Search in Google Scholar

Alho, J., M. Mulry, K. Wurdeman, and J. Kim. 1993. “Estimating heterogeneity in the probabilities of enumeration for dual-system estimation”. Journal of the American Statistical Association 88 (423): 1–130. DOI: http://doi.org/10.1080/01621459.1993.10476386. Search in Google Scholar

Bartlett, M. 1935. “Contingency table interactions”. Journal of the Royal Statistical Society, Supplement 2: 248–252. DOI: http://doi.org/10.2307/2983639. Search in Google Scholar

Bartolucci, F., and A. Forcina 2006. “A class of latent marginal models for capture-recapture data with continuous covariates”. Journal of the American Statistical Association 101 (474): 786–794. DOI: http://doi.org/10.1198/073500105000000243. Search in Google Scholar

Biemer, P., H. Woltmann, D. Raglin, and J. Hill. 2001. “Enumeration Accuracy in a Population Census: An Evaluation Using Latent Class Analysis”. Journal of Official Statistics 17 (1): 129–148. Available at: https://www.scb.se/contentassets/ca21efb41-fee47d293bbee5bf7be7fb3/enumeration-accuracy-in-a-population-census-an-evaluation-using-latent-class-analysis.pdf (accessed June 2021). Search in Google Scholar

Biggeri, A., F. Merletti, and M. Marchi. 1999. “Latent class models for varying catchability and correlation among sources in capture-recapture estimation”. Statistica Applicata 11: 563–573. Search in Google Scholar

Bishop, Y., S. Fienberg, P. Holland, J. Richard, and F. Mosteller. 1975. “ Discrete multivariate analysis: theory and practice. MIT Press, Cambridge, Massachusetts. DOI: http://doi.org/10.1007/978-0-387-72806-3. Search in Google Scholar

Brown, G., P. Biemer, and D. Judson. 2006. “Estimating Erroneous Enumeration in the US Decennial Census using Four Lists”. Proceedings of the American Statistical Association, Section on Survey Research Methods. Search in Google Scholar

Brown, J., O. Abbott, and I. Diamond. 2006. “Dependence in the 2001 one-number census project”. Journal of the Royal Statistical Society: Series A (Statistics in Society) 169 (4): 883–902. DOI: http://doi.org/10.1111/j.1467-985X.2006.00431.x. Search in Google Scholar

Brown, J., I. Diamond, R. Chambers, L. Buckner, and A. Teague. 1999. “A methodological strategy for a one- number census in the UK”. Journal of the Royal Statistical Society: Series A (Statistics in Society) 162 (2): 247–267. DOI: http://doi.org/10.1111/1467-985X.00133. Search in Google Scholar

Bruno, G., E. LaPorte, F. Merletti, A. Biggeri, D. McCarthy, and G. Pagano. 1994. “National diabetes programs: Application of capture-recapture to count diabetes? Diabetes Care 17 (6): 548–556. DOI: http://doi.org/10.2337/diacare.17.6.548. Search in Google Scholar

Buckland, S., and P. Garthwaite. 1991. “Quantifying precision of mark-recapture estimates using the bootstrap and related methods”. Biometrics 47 (1): 255–268. DOI: http://doi.org/10.2307/2532510. Search in Google Scholar

Chandrasekar, C., and W. Deming. 1949. “On a method of estimating birth and death rates and the extent of registration”. Journal of the American Statistical Association 44 (245): 101–115. Search in Google Scholar

Chao, A. 2001. “An overview of closed capture-recapture models”. Journal of Agricultural, Biological and Environmental Statistics 6 (2): 158–175. DOI: http://doi.org/10.1198/108571101750524670. Search in Google Scholar

Chen, S., C. Tang, and V. Mule. 2010. “Local post-stratification in dual system accuracy and coverage evaluation for the U.S. census”. Journal of the American Statistical Association 405 (489): 105–119. DOI: http://doi.org/10.1198/jasa.2009.ap08404. Search in Google Scholar

Cormack, R. M. 1972. “The logic of capture-recapture estimates”. Biometrics 28 (2): 337–343. DOI: http://doi.org/10.2307/2556151. Search in Google Scholar

Cormack, R. 1992. “Interval estimation for mark-recapture studies of closed populations”. Biometrics 48 (2): 567–576. DOI: http://doi.org/10.2307/2532310. Search in Google Scholar

Coull, B., and A. Agresti. 1999. “The Use of Mixed Logit Models to Reflect Heterogeneity in Capture-Recapture Studies”. Biometrics 55 (1): 294–301. DOI: http://doi.org/0.1111/j.0006-341X.1999.00294.x. Search in Google Scholar

Darroch, J. 1958. “The Multiple-Recapture Census I. Estimation of a Closed Population”. Biometrika 45(3–4): 343–359. DOI: http://doi.org/10.1093/biomet/45.3-4.343. Search in Google Scholar

Darroch, J., S. Fienberg, G. Glonek, and B. Junker. 1993. “A three-sample multiple-recapture approach to census population estimation with heterogeneous catchability”. Journal of the American Statistical Association 88 (423): 1137–1148. DOI: http://doi.org/10.1080/01621459.1993.10476387. Search in Google Scholar

Deming, W., and F. Stephan. 1940. “On a least squares adjustment of a sampled frequency table when the expected marginal totals are known”. Annals of Mathematical Statistics 11 (4): 427–444. DOI: http://doi.org/0.1214/aoms/1177731829. Search in Google Scholar

Dempster, A., N. Laird, and D. Rubin. 1977. “Maximum likelihood from incomplete data via the EM algorithm”. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 39 (1): 1–38. DOI: http://doi.org/10.1111/j.2517-6161.1977.tb01600.x. Search in Google Scholar

Di Cecco, D., M. Di Zio, D. Filipponi, and I. Rocchetti. 2018. “Population Size Estimation Using Multiple Incomplete Lists with Overcoverage”. Journal of Official Statistics 34 (2): 557–572. DOI: http://doi.org/10.2478/jos-2018-0026. Search in Google Scholar

Fienberg, S. 1972. “The multiple recapture census for closed populations and incomplete 2k contingency tables”. Biometrika 59 (3): 591–603. DOI: http://doi.org/10.1093/biomet/59.3.591. Search in Google Scholar

Fienberg, S., M. Johnson, and B. Junker. 1999. “Classical multilevel and Bayesian approaches to population size estimation using multiple lists”. Journal of the Royal Statistical Society: Series A (Statistics in Society) 162 (3): 383–405. DOI: http://doi. org/10.1111/1467-985X.00143. Search in Google Scholar

Formann, A. 2003. “Latent Class Model Diagnosis from a Frequentist Point of View”. Biometrics 59 (1): 189–196. DOI: http://doi.org/10.1111/1541-0420.00023. Search in Google Scholar

Gerritse, S. C., P. G. van der Heijden, and B. F. Bakker. 2015. “Sensitivity of Population Size Estimation for Violating Parametric Assumptions in Log-linear Models”. Journal of Official Statistics 31 (3): 357–379. DOI: http://doi.org/10.1515/jos-2015-0022. Search in Google Scholar

Goodman, L. 1974. “Exploratory latent structure analysis using both identifiable and unidentifiable models”. Biometrika 61 (2): 215–231. DOI: http://doi.org/0.2307/2334349. Search in Google Scholar

Haberman, S. 1979. “ Analysis of Qualitative Data: Vol. 2. New Developments. Academic Press, New York. Search in Google Scholar

Hagenaars, J. 1993. “ Loglinear Models With Latent Variables. Sage (University Paper Series). Search in Google Scholar

Hogan, H. 1992. “The 1990 Post-Enumeration Survey: An Overview”. American Statistician 46: 261–269. DOI: http://doi.org/10.2307/2685308. Search in Google Scholar

International Working Group for Disease Monitoring and Forecasting. 1995. “Capture-recapture and multiple- record systems estimations I: history and theoretical development”. American Journal of Official Epidemiology 142: 1047–1058. Search in Google Scholar

Isaki, C., and L. Schultz. 1986. “Dual system estimation using demographic analysis data”. Journal of Official Statistics 2 (2): 169–179. Available at: https://www.scb.se/-contentassets/ca21efb41fee47d293bbee5bf7be7fb3/dual-system-estimation-using-demographic-analysis-data.pdf (accessed June 2021). Search in Google Scholar

Lazarsfeld, P., and N. Henry. 1968. “ Latent Structure Analysis. Houghton, Mifflin. Search in Google Scholar

Meng, X., and D. Rubin. 1991. “Using EM to obtain asymptotic variance-covariance matrices: the SEM algorithm”. Journal of the American Statistical Association 86 (416): 899–909. DOI: http://doi.org/10.1080/01621459.1991.10475130. Search in Google Scholar

Pollock, K. 2002. “The use of auxiliary variables in capture-recapture modelling: an overview”. Journal of Applied Statistics 29 (1): 85–102. DOI: http://doi.org/10.1080/02664760120108430. Search in Google Scholar

Seber, G. 1986. “A review of estimating animal abundance”. Biometrics 42 (2): 267–292. DOI: http://doi.org/10.2307/2531049. Search in Google Scholar

Stanghellini, E., and P. van der Heijden. 2004. “A multiple-record systems estimation method that take observed and unobserved heterogeneity into account”. Biometrics 60 (2): 510–516. DOI: http://doi.org/10.1111/j.0006-341X.2004.00197.x. Search in Google Scholar

Wang, Y., and J. Thandrayen. 2009. “Mutliple-Record Systems Estimation using Latent Class Models”. Australian and New Zealand Journal of Statistics 51 (1): 101–111. DOI: http://doi.org/10.1111/j.1467-842X.2008.00531.x. Search in Google Scholar

Wolter, K. 1986. “Some Coverage Error Models for Census Data”. Journal of the American Statistical Association 81 (394): 338–346.. DOI: http://doi.org/10.1080/01621459.1986.10478277. Search in Google Scholar

Wolter, K. 1990. “Capture-Recapture Estimation in the Presence of a Known Sex Ratio”. Biometrics 46 (1): 157–162. DOI: http://doi.org/10.2307/2531638. Search in Google Scholar

Zaslavsky, A., and G. Wolfgang. 1990. “Triple System Modeling of Census, Post-Enumeration Survey and Administrative List Data”. Proceedings of the American Statistical Association, Section on Survey Research Methods. Search in Google Scholar

Zaslavsky, A., and G. Wolfgang. 1993. “Triple-System Modeling of Census, Post-Enumeration Survey, and Administrative-List Data”. Journal of Business and Economic Statistics 11: 279–288. DOI: http://doi.org/10.1080/07350015.1993.10509955. Search in Google Scholar

Zhang, L.-C. 2019. “A note on dual system population size estimator”. Journal of Official Statistics 35 (1): 279–282. DOI: http://doi.org/10.2478/jos-2019-0012. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo