1. bookVolume 37 (2021): Issue 4 (December 2021)
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year
access type Open Access

Combining Cluster Sampling and Link-Tracing Sampling to Estimate Totals and Means of Hidden Populations in Presence of Heterogeneous Probabilities of Links

Published Online: 26 Dec 2021
Page range: 865 - 905
Received: 01 Jun 2020
Accepted: 01 Jun 2021
Journal Details
First Published
01 Oct 2013
Publication timeframe
4 times per year

We propose Horvitz-Thompson-like and Hájek-like estimators of the total and mean of a response variable associated with the elements of a hard-to-reach population, such as drug users and sex workers. A portion of the population is assumed to be covered by a frame of venues where the members of the population tend to gather. An initial cluster sample of elements is selected from the frame, where the clusters are the venues, and the elements in the sample are asked to name their contacts who belong to the population. The sample size is increased by including in the sample the named elements who are not in the initial sample. The proposed estimators do not use design-based inclusion probabilities, but model-based inclusion probabilities which are derived from a Rasch model and are estimated by maximum likelihood estimators. The inclusion probabilities are assumed to be heterogeneous, that is, they depend on the sampled people. Variance estimates are obtained by bootstrap and are used to construct confidence intervals. The performance of the proposed estimators and confidence intervals is evaluated by two numerical studies, one of them based on real data, and the results show that their performance is acceptable.


Abramowitz, M., and I.A. Stegun. 1964. Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Ninth Dover printing, tenth GPO printing. New York: Dover. Search in Google Scholar

Bernard, H.R., T. Hallett, A. Iovita, E.C. Johnsen, R. Lyerla, C. McCarty, M. Mahy, M.J. Salganik, T. Saliuk, O. Scutelniciuc, G.A. Shelley, P. Sirinirund, S. Weir, and D.F. Stroup. 2010. “Counting hard-to-count populations: The network scale-up method for public health.” Sexually Transmitted Infections 86 (Suppl. 2): 11–15. DOI: http://dx.doi.org/10.1136/sti.2010.044446.10.1136/sti.2010.044446 Search in Google Scholar

Booth, J.G., R.W. Butler, and P. Hall. 1994. “Bootstrap methods for finite populations.” Journal of the American Statistical Association 89: 1282 – 1289. DOI: https://doi.org/10.1080/01621459.1994.10476868.10.1080/01621459.1994.10476868 Search in Google Scholar

Burnham, K.P., and W.S. Overton. 1978. “Estimation of the size of a closed population when capture probabilities vary among animals.” Biometrika 65: 625–633. DOI: https://doi.org/10.1093/biomet/65.3.625.10.1093/biomet/65.3.625 Search in Google Scholar

Chao, A. 1987. “Estimating the population size for capture-recapture data with unequal catchability.” Biometrics 43: 783–791. DOI: https://doi.org/10.2307/2531532.10.2307/2531532 Search in Google Scholar

Cheng, S., D.J. Eck, and F.W. Crawford. 2020. “Estimating the size of a hidden finite set: Large-sample behavior of estimators.” Statistics Surveys 14: 1–31. DOI: https://doi.org/10.1214/19-SS127.10.1214/19-SS127 Search in Google Scholar

Chow, M., and S.K. Thompson. 2003. “Estimation with link-tracing sampling designs – A Bayesian approach.” Survey Methodology 29: 197–205. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006002/article/9553-eng.pdf?st=6cXXjDD2 (accessed April 2020). Search in Google Scholar

Coull, B.A., and A. Agresti. 1999. “The use of mixed logit models to reflect heterogeneity in capture-recapture studies.” Biometrics 55: 294–301. DOI: https://doi.org/10.1111/j.0006-341X.1999.00294.x.10.1111/j.0006-341X.1999.00294.x Search in Google Scholar

Crawford, F.W., J. Wu, and R. Heimer. 2018. “Hidden population size estimation from respondent-driven sampling: A network approach.” Journal of the American Statistical Association 113: 755–766. DOI: https://doi.org/10.1080/01621459.2017.1285775.10.1080/01621459.2017.1285775 Search in Google Scholar

Dávid, B., and T.A.B. Snijders. 2002. “Estimating the Size of the Homeless Population in Budapest, Hungary.” Quality & Quantity 36: 291–303. DOI: https://doi.org/10.1023/A:1016080606287.10.1023/A:1016080606287 Search in Google Scholar

Davison, A.C., and D.V. Hinkley. 1997. Bootstrap Methods and their Applications. New York: Cambridge University Press.10.1017/CBO9780511802843 Search in Google Scholar

Dombrowski, K., B. Khan, T. Wendel, K. McLean, E. Misshula, and R. Curtis. 2012. “Estimating the Size of the Methamphetamine-Using Population in New York City Using Network Sampling Techniques.” Advances in Applied Sociology 2: 245–252. DOI: https://doi.org/10.4236/aasoci.2012.24032.10.4236/aasoci.2012.24032 Search in Google Scholar

Félix-Medina, M.H., and P.E. Monjardin. 2006. “Combining link-tracing sampling and cluster sampling to estimate the size of hidden populations: A Bayesian assisted approach.” Survey Methodology 32: 187–195. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2006002/article/9553-eng.pdf?st=6cXXjDD2 (accessed April 2020). Search in Google Scholar

Félix-Medina, M.H., and P.E. Monjardin. 2010. “Combining Link-Tracing Sampling and Cluster Sampling to Estimate Totals and Means of Hidden Human Populations.” Journal of Official Statistics 26(4): 603–631. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/20200206/felix-medina.pdf (accessed April 2020). Search in Google Scholar

Félix-Medina, M.H., P.E. Monjardin, and A.N. Aceves-Castro. 2015. “Combining link-tracing sampling and cluster sampling to estimate the size of a hidden population in presence of heterogeneous link-probabilities.” Survey Methodology: 349–376. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2015002/article/14238-eng.pdf?st=ki7rx2GZ (accessed April 2020). Search in Google Scholar

Félix-Medina, M.H., and S.K. Thompson. 2004. “Combining Cluster Sampling and Link-Tracing Sampling to Estimate the Size of Hidden Populations.” Journal of Official Statistics 20(1): 19–38. Available at: https://www.scb.se/contentassets/ca21efb41-fee47d293bbee5bf7be7fb3/combining-link-tracing-sampling-and-cluster-sampling-to-estimate-the-size-of-hidden-populations.pdf (accessed April 2020). Search in Google Scholar

Fienberg, S.E., M.S. Johnson, and B.W. Junker. 1999. “Classical multilevel and Bayesian approaches to population size estimation using multiple lists.” Journal of the Royal Statistical Society. Series A 162: 383–405. DOI: https://doi.org/10.1111/1467-985X.00143.10.1111/1467-985X.00143 Search in Google Scholar

Frank, O., and T. Snijders. 1994. “Estimating the Size of Hidden Populations Using Snowball Sampling.” Journal of Official Statistics 10 (1): 53–67. Available at: https://www.scb.se/contentassets/ca21efb41fee47d293bbee5bf7be7fb3/estimating-the-size-of-hidden-populations-using-snowball-sampling.pdf (accessed April 2020). Search in Google Scholar

Freeman L.C. (n.d.) Network data sets repository. Available at: http://moreno.ss.uci.edu/data (accessed June 2018). Search in Google Scholar

Giner, G., and G.K. Smyth. 2016. “statmod: Probability calculations for the inverse Gaussian distribution.” The R Journal 8: 339–351. DOI: https://doi.org/10.32614/RJ-2016-024.10.32614/RJ-2016-024 Search in Google Scholar

Handcock, M.S., K.J. Gile, and C.M. Mar. 2014. “Estimating hidden population size using respondent-driven sampling data.” Electronic Journal of Statistics 8: 1491–1521. DOI: https://doi.org/10.1214/14-EJS923.10.1214/14-EJS923 Search in Google Scholar

Harris, K.M. 2013. “The add health study: Design and accomplishments.” Available at: www.cpc.unc.edu/projects/addhealth/data/guides/DesignPaperWIIV.pdf (accessed September 2017). Search in Google Scholar

Heckathorn, D.D. 1997. “Respondent driven sampling: a new approach to the study of hidden samples.” Social Problems 44: 174–199. DOI: https://doi.org/10.2307/3096941.10.2307/3096941 Search in Google Scholar

Heckathorn, D.D. 2002. “Respondent-driven sampling II: deriving valid population estimates from chain-referral samples of hidden populations.” Social Problems 49: 11–34. DOI: https://doi.org/10.1525/sp.2002. Search in Google Scholar

Heckathorn, D.D., and C.J. Cameron. 2017. “Network sampling: From snowball and multiplicity to respondent-driven sampling.” Annual Review of Sociology 43: 101–119. DOI: https://doi.org/10.1146/annurev-soc-060116-053556.10.1146/annurev-soc-060116-053556 Search in Google Scholar

Hwang, W-H., and R. Huggins. 2005. “An examination of the effect of heterogeneity on the estimation of population size using capture-recapture data.” Biometrika 92: 229–233. DOI: https://doi.org/10.1093/biomet/ Search in Google Scholar

Johnston, L.G., D. Prybylski, H.F. Raymond, A. Mirzazadeh, C. Manopaiboon, and W. McFarland. 2013. “Incorporating the service multiplier method in respondent-driven sampling surveys to estimate the size of hidden and hard-to-reach populations: Case studies from around the world.” Sexually Transmitted Diseases 40: 304–310. DOI: https://doi.org/10.1097/OLQ.0b013e31827fd650.10.1097/OLQ.0b013e31827fd650 Search in Google Scholar

Kalton, G. 2009. “Methods for oversampling rare populations in social surveys.” Survey Methodology 35: 125–141. Available at: https://www150.statcan.gc.ca/n1/en/pub/12-001-x/2009002/article/11036-eng.pdf?st=PyvQkTH4 (accessed April 2020). Search in Google Scholar

Khan, B., H.-W. Lee, I. Fellows, and K. Dombrowski. 2018. “One-step estimation of networked population size: Respondent-driven capture-recapture with anonymity.” PLoS ONE 13 (4): 1–39. DOI: https://doi.org/10.1371/journal.pone.0195959.10.1371/journal.pone.0195959 Search in Google Scholar

Killworth, P., E. Johnsen, C. McCarty, G. Shelley, and H. Bernard. 1998a. “A social network approach to estimating seroprevalence in the United States.” Social Networks 20: 23–50. DOI: https://doi.org/10.1016/S0378-8733(96)00305-X.10.1016/S0378-8733(96)00305-X Search in Google Scholar

Killworth, P., C. McCarty, H. Bernard, G. Shelley, and E. Johnsen. 1998b. “Estimation of seroprevalence, rape and homelessness in the United States using a social network approach.” Evaluation Review 22: 289–308. DOI: https://doi.org/10.1177/0193841X9802200205.10.1177/0193841X9802200205 Search in Google Scholar

Klovdahl, A.S. 1989. “Urban Social Networks: Some Methodological Problems and Possibilities.” In The Small World, edited by M. Kochen, 176–210. Norwood, NJ: Ablex. Search in Google Scholar

Korn, E.L., and B.I. Graubard. 1998. “Confidence intervals for proportions with small expected number of positive counts estimated from survey data.” Survey Methodology 24: 193–201. Search in Google Scholar

Lee, S., J. Wagner, R. Valliant, and S. Heeringa. 2014. “Recent developments of sampling hard-to-survey populations: An assessment.” In Hard-to-Survey Populations, edited by R. Tourangeau, B. Edwards, T. Johnson, K. Wolter, and N. Bates (Eds.), 424–444. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139381635.025.10.1017/CBO9781139381635.025 Search in Google Scholar

MacKellar, D., L. Valleroy, J. Karon, G. Lemp, and R. Janssen. 1996. “The Young Men’s Survey: Methods for estimating HIV seroprevalence and risk factors among young men who have sex with men.” Public Health Reports 111(Suppl. 1): 138–44. Available at: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1382056/pdf/pubhealthrep00044-0140.pdf (accessed April 2020). Search in Google Scholar

Magnani, R., K. Sabin, T. Saidel, and D. Heckathorn. 2005. “Review of sampling hard-to-reach populations for HIV surveillance.” AIDS 19: S67–S72. DOI: https://doi.org/10.1097/01.aids.0000172879.20628.e1.10.1097/01.aids.0000172879.20628.e1 Search in Google Scholar

Maltiel, R., A.E. Raftery, T.H. McCormick, and A.J. Baraff. 2015. “Estimating population size using the network scale up method.” The Annals of Applied Statistics 9: 1247–1277. DOI: https://doi.org/10.1214/15-AOAS827.10.1214/15-AOAS827 Search in Google Scholar

Marpsat, M., and N. Razafindratsima. 2010. “Survey methods for hard-to-reach populations: Introduction to the special issue.” Methodological Innovations Online 5: 3–16. DOI: https://doi.org/10.4256/mio.2010.0014.10.4256/mio.2010.0014 Search in Google Scholar

McCormick, T.H., M.J. Salganik, and T. Zheng. 2010. “How many people do you know? Efficiently estimating personal network size.” Journal of the American Statistical Association 105: 59–70. DOI: https://doi.org/10.1198/jasa.2009.ap08518.10.1198/jasa.2009.ap08518 Search in Google Scholar

Meng, V.Y., and P. Gustafson. 2017. “Inferring population size: extending the multiplier method to incorporate multiple traits with a likelihood-based approach.” Stat 6: 4–13. DOI: https://doi.org/10.1002/sta4.131.10.1002/sta4.131 Search in Google Scholar

Pledger, S. 2000. “Unified maximum likelihood estimates for closed capture-recapture models using mixtures.” Biometrics 56: 434–442. DOI: https://doi.org/10.1111/j.0006-341X.2000.00434.x.10.1111/j.0006-341X.2000.00434.x Search in Google Scholar

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available at: https://www.R-project.org/ (accessed April 2020). Search in Google Scholar

Reiser, M. 2019. “Goodness of fit testing in sparse contingency tables when the number of variables is large.” Wiley Interdisciplinary Reviews: Computational Statistics 11(6): e1470. DOI: https://doi.org/10.1002/wics.1470.10.1002/wics.1470 Search in Google Scholar

Sanathanan, L. 1972. “Estimating the size of a multinomial population.” Annals of Mathematical Statistics 43: 142–152. DOI: https://doi.org/10.1214/aoms/1177692709.10.1214/aoms/1177692709 Search in Google Scholar

Särndal, C.-E., B. Swensson, and J. Wretman. 1992. Model Assisted Survey Sampling. New York: Springer-Verlag.10.1007/978-1-4612-4378-6 Search in Google Scholar

Spreen, M. 1992. “Rare populations, hidden populations and link-tracing designs: What and why?” Bulletin de Méthodologie Sociologique 36: 34–58. DOI: https://doi.org/10.1177/075910639203600103.10.1177/075910639203600103 Search in Google Scholar

Spreen, M., and S. Bogaerts. 2015. “B-Graph Sampling to Estimate the Size of a Hidden Population.” Journal of Official Statistics 31: 723–736. DOI: https://doi.org/10.1515/-jos-2015-0042. Search in Google Scholar

Staudte, R.G., and S.J. Sheather. 1990. Robust Estimation and Testing. New York: Wiley.10.1002/9781118165485 Search in Google Scholar

St. Clair, K., and D. O’Connell. 2012. “A Bayesian model for estimating population means using a link-tracing sampling design.” Biometrics 68: 165–173. DOI: https://doi.org/10.1111/j.1541-0420.2011.01631.x.10.1111/j.1541-0420.2011.01631.x Search in Google Scholar

Thompson, S.K. 2012. Sampling, Third edition. New Jersey: Wiley.10.1002/9781118162934 Search in Google Scholar

Thompson, S.K., and O. Frank. 2000. “Model-based estimation with link-tracing sampling designs.” Survey Methodology 26: 87–98. Available at https://www150.statcan.gc.ca/-n1/en/pub/12-001-x/2000001/article/5181-eng.pdf?st=dgk9U6pj (accessed April 2020). Search in Google Scholar

Tourangeau, R. 2014. “Defining hard-to-survey populations.” In Hard-to-Survey Populations, edited by R. Tourangeau, B. Edwards, T. Johnson, K. Wolter, and N. Bates (Eds.), 3–20. Cambridge: Cambridge University Press. DOI: https://doi.org/10.1017/CBO9781139381635.003.10.1017/CBO9781139381635.003 Search in Google Scholar

UNAIDS/WHO (World Health Organization Working Group on Global HIV/AIDS, and STI Surveillance). 2010. Guidelines on estimating the size of populations most at risk to HIV. UNAIDS– Joint United Nations Programme on HIV/AIDS. Available at: https://data.unaids.org/pub/manual/2010/guidelines_popnestimationsize_en.pdf. (accessed June 2015). Search in Google Scholar

Volz, E., and D. Heckathorn. 2008. “Probability based estimation theory for respondent driven sampling.” Journal of Official Statistics 24: 79–97. Available at: http://www.sverigeisiffror.scb.se/contentassets/ff271eeeca694f47ae99b942de61df83/probability-based-estimation-theory-for-respondent-driven-sampling.pdf (accessed April 2020). Search in Google Scholar

Williams, B.K., J.D. Nichols, and M.J. Conroy. 2002. Analysis and Management of Animal Populations. San Diego, California: Academic Press. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo