1. bookVolume 36 (2020): Issue 2 (June 2020)
Journal Details
License
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
access type Open Access

Analysing Sensitive Data from Dynamically-Generated Overlapping Contingency Tables

Published Online: 15 Jun 2020
Volume & Issue: Volume 36 (2020) - Issue 2 (June 2020)
Page range: 275 - 296
Received: 01 Jun 2019
Accepted: 01 Feb 2020
Journal Details
License
Format
Journal
eISSN
2001-7367
First Published
01 Oct 2013
Publication timeframe
4 times per year
Languages
English
Abstract

Contingency tables provide a convenient format to publish summary data from confidential survey and administrative records that capture a wide range of social and economic information. By their nature, contingency tables enable aggregation of potentially sensitive data, limiting disclosure of identifying information. Furthermore, censoring or perturbation can be used to desensitise low cell counts when they arise. However, access to detailed cross-classified tables for research is often restricted by data custodians when too many censored or perturbed cells are required to preserve privacy. In this article, we describe a framework for selecting and combining log-linear models when accessible data is restricted to overlapping marginal contingency tables. The approach is demonstrated through application to housing transition data from the Australian Census Longitudinal Data set provided by the Australian Bureau of Statistics.

Keywords

ABS. 2012. TableBuilder user manual. Technical report, Australia Bureau of Statistics, Canberra, ACT (cat.no 2065.0). Available at: http://www.abs.gov.au/tablebuilder (accessed October 2016). Search in Google Scholar

ABS. 2013. Australian Census Longitudinal Dataset: Methodology and quality assessment – 2080.5 – 2006-11. Technical report, Australia Bureau of Statistics, Canberra, ACT. Available at: https://www.abs.gov.au/AUSSTATS/abs@.nsf/Lookup/2080.5Main+Features12006-2016 (accessed October 2016). Search in Google Scholar

Agresti, A. 1981. “Measures of nominal-ordinal association.” Journal of the American Statistical Association 76(375): 524–529. DOI: https://doi.org/10.1080/01621459.1981.10477679.10.1080/01621459.1981.10477679 Search in Google Scholar

Agresti, A. 2002. Categorical Data Analysis. Springer, second edition.10.1002/0471249688 Search in Google Scholar

Akaike, H. 1974. “A new look at the statistical model identification.” IEEE Transactions on Automatic Control 19(6): 716–723. DOI: https://doi.org/10.1109/TAC.1974.1100705.10.1109/TAC.1974.1100705 Search in Google Scholar

Allison, P.D. 1980. “Analyzing collapsed contingency tables without actually collapsing.” American Sociological Review 45(1): 123–130. DOI: https://doi.org/10.2307/2095247.10.2307/2095247 Search in Google Scholar

Bergsma, W., M.A. Croon, and J.A. Hagenaars. 2009. Marginal models: For dependent, clustered, and longitudinal categorical data. Springer Science & Business Media. Search in Google Scholar

Bergsma, W.P. and T. Rudas. 2002. “Marginal models for categorical data.” Annals of Statistics 30(1): 140–159. DOI: https://doi.org/10.1214/aos/1015362188.10.1214/aos/1015362188 Search in Google Scholar

Birch, M. 1963. “Maximum likelihood in three-way contingency tables.” Journal of the Royal Statistical Society. Series B (Methodological) 25: 220–233. Available at: https://www.jstor.org/stable/2984562 (accessed November 2017).10.1111/j.2517-6161.1963.tb00504.x Search in Google Scholar

Bishop, Y., S. Fienberg, and P. Holland. 1975. Discrete multivariate analysis: Theory and practice. The MIT Press, Cambridge, Massachusetts. Search in Google Scholar

Cameron, A. and P. Trivedi. 1998. Regression analysis of count data. Cambridge University Press.10.1017/CBO9780511814365 Search in Google Scholar

Chipperfield, J., D. Gow, and B. Loong. 2016. “The Australian Bureau of Statistics and releasing frequency tables via a remote server.” Statistical Journal of the IAOS 32(1): 53–64. DOI: https://doi.org/10.3233/SJI-160969.10.3233/SJI-160969 Search in Google Scholar

Chipperfield, J., J. Brown, and N. Watson. 2017. “The Australian Census Longitudinal Dataset: using record linkage to create a longitudinal sample from a series of cross-sections.” Australian and New Zealand Journal of Statistics 59(1): 1–16. DOI: https://doi.org/10.1111/anzs.12177.10.1111/anzs.12177 Search in Google Scholar

Dahinden, C., M. Kalisch, and P. Bühlmann. 2010. “Decomposition and model selection for large contingency tables.” Biometrical Journal 52(2): 233–252. DOI: https://doi.org/10.1002/bimj.200900083.10.1002/bimj.20090008320213739 Search in Google Scholar

Darroch, J.N., S.L. Lauritzen, and T.P. Speed. 1980. “Markov fields and log-linear interaction models for contingency tables.” The Annals of Statistics 8(3): 522–539. DOI: https://doi.org/10.1214/aos/1176345006.10.1214/aos/1176345006 Search in Google Scholar

Domingo-Ferrer, J. and J. Mateo-Sanz. 1999. “Resampling for statistical confidentiality in contingency tables.” Computers & Mathematics with Applications 38(11–12): 13–32. DOI: https://doi.org/10.1016/S0898-1221(99)00281-3.10.1016/S0898-1221(99)00281-3 Search in Google Scholar

Duncan, G., M. Elliot, and J.-J. Salazar-González. 2011. Statistical Confidentiality: Principles and Practice. Statistics for Social and Behavioral Sciences. Springer, New York, NY, second edition.10.1007/978-1-4419-7802-8 Search in Google Scholar

Frydenberg, M. 1990. “Marginalization and collapsibility in graphical interaction models.” The Annals of Statistics 8(2): 790–805. DOI: https://doi.org/10.1214/aos/1176347626.10.1214/aos/1176347626 Search in Google Scholar

Frydenberg, M. and S.L. Lauritzen. 1989. Decomposition of maximum likelihood in mixed graphical interaction models. Biometrika 76(3): 539–555. DOI: https://doi.org/10.2307/2336119.10.2307/2336119 Search in Google Scholar

Jones, E. and V. Didelez. 2017. “Thinning a triangulation of a Bayesian network or undirected graph to create a minimal triangulation.” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 25(3): 349–366. DOI: https://doi.org/10.1142/S0218488517500143.10.1142/S0218488517500143 Search in Google Scholar

Lang, J.B. 1996. “On the comparison of multinomial and Poisson log-linear models.” Journal of the Royal Statistical Society. Series B (Methodological) 58(1): 253–266. Available at: https://www.jstor.org/stable/2346177 (accessed October 2017).10.1111/j.2517-6161.1996.tb02079.x Search in Google Scholar

Lauritzen, S.L. 1996. Graphical models, volume 17. Clarendon Press. Search in Google Scholar

Lee, J.Y., J.J. Brown, and L.M. Ryan. 2017. “Sufficiency revisited: Rethinking statistical algorithms in the big data era.” The American Statistician 71(3): 202–208. DOI: https://doi.org/10.1080/00031305.2016.1255659.10.1080/00031305.2016.1255659 Search in Google Scholar

Leimer, H.-G. 1993. “Optimal decomposition by clique separators.” Discrete mathematics 113(1–3): 99–123. DOI: https://doi.org/10.1016/0012-365X(93)90510-Z.10.1016/0012-365X(93)90510-Z Search in Google Scholar

Nelder, J. and R. Wedderburn. 1972. “Generalized linear models.” Journal of the Royal Statistical Society. Series A (General) 135(3): 370–384. DOI: https://doi.org/10.2307/2344614.10.2307/2344614 Search in Google Scholar

Olesen, K.G. and A.L. Madsen. 2002. “Maximal prime subgraph decomposition of Bayesian networks.” IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 32(1): 21–31. DOI: https://doi.org/10.1109/3477.979956.10.1109/3477.97995618238100 Search in Google Scholar

R Core Team. 2016. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. 2016. Available at: https://www.R-project.org/ (accessed November 2018). Search in Google Scholar

Rose, D.J., R.E. Tarjan, and G.S. Lueker. 1976. “Algorithmic aspects of vertex elimination on graphs.” SIAM Journal on computing 5(2): 266–283. DOI: https://doi.org/10.1137/0205021.10.1137/0205021 Search in Google Scholar

Spallek, M., M. Haynes, and A. Jones. 2014. “Holistic housing pathways for Australian families through the childbearing years.” Longitudinal and Life Course Studies 5(2): 205–226. DOI: https://doi.org/10.14301/llcs.v5i2.276.10.14301/llcs.v5i2.276 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo