1. bookVolume 2022 (2022): Issue 1 (January 2022)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Differentially private partition selection

Published Online: 20 Nov 2021
Page range: 339 - 352
Received: 31 May 2021
Accepted: 16 Sep 2021
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

Many data analysis operations can be expressed as a GROUP BY query on an unbounded set of partitions, followed by a per-partition aggregation. To make such a query differentially private, adding noise to each aggregation is not enough: we also need to make sure that the set of partitions released is also differentially private.

This problem is not new, and it was recently formally introduced as differentially private set union [14]. In this work, we continue this area of study, and focus on the common setting where each user is associated with a single partition. In this setting, we propose a simple, optimal differentially private mechanism that maximizes the number of released partitions. We discuss implementation considerations, as well as the possible extension of this approach to the setting where each user contributes to a fixed, small number of partitions.

Keywords

[1] G. Acs, C. Castelluccia, and R. Chen. Differentially private histogram publishing through lossy compression. In 2012 IEEE 12th International Conference on Data Mining, pages 1–10. IEEE, 2012. Search in Google Scholar

[2] B. Balle and Y.-X. Wang. Improving the gaussian mechanism for differential privacy: Analytical calibration and optimal denoising. In International Conference on Machine Learning, pages 394–403. PMLR, 2018. Search in Google Scholar

[3] J. Bater, X. He, W. Ehrich, A. Machanavajjhala, and J. Rogers. Shrinkwrap: Differentially-private query processing in private data federations. arXiv preprint arXiv:1810.01816, 2018. Search in Google Scholar

[4] G. Cormode, C. Procopiuc, D. Srivastava, and T. T. Tran. Differentially private summaries for sparse data. In Proceedings of the 15th International Conference on Database Theory, pages 299–311, 2012. Search in Google Scholar

[5] G. Cormode, M. Procopiuc, D. Srivastava, and T. T. Tran. Differentially private publication of sparse data. arXiv preprint arXiv:1103.0825, 2011. Search in Google Scholar

[6] B. Ding, M. Winslett, J. Han, and Z. Li. Differentially private data cubes: optimizing noise sources and consistency. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 217–228, 2011. Search in Google Scholar

[7] C. Dwork, F. McSherry, K. Nissim, and A. Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006. Search in Google Scholar

[8] C. Dwork, M. Naor, O. Reingold, G. N. Rothblum, and S. Vadhan. On the complexity of differentially private data release: efficient algorithms and hardness results. In Proceedings of the forty-first annual ACM symposium on Theory of computing, pages 381–390, 2009. Search in Google Scholar

[9] C. Dwork and A. Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014. Search in Google Scholar

[10] P. Flajolet, É. Fusy, O. Gandouet, and F. Meunier. Hyper-loglog: the analysis of a near-optimal cardinality estimation algorithm. In Discrete Mathematics and Theoretical Computer Science, pages 137–156. Discrete Mathematics and Theoretical Computer Science, 2007. Search in Google Scholar

[11] Q. Geng, W. Ding, R. Guo, and S. Kumar. Tight analysis of privacy and utility tradeoff in approximate differential privacy. In International Conference on Artificial Intelligence and Statistics, pages 89–99, 2020. Search in Google Scholar

[12] Q. Geng and P. Viswanath. Optimal noise adding mechanisms for approximate differential privacy. IEEE Transactions on Information Theory, 62(2):952–969, Feb 2016. Search in Google Scholar

[13] A. Ghosh, T. Roughgarden, and M. Sundararajan. Universally utility-maximizing privacy mechanisms. SIAM Journal on Computing, 41(6):1673–1693, 2012. Search in Google Scholar

[14] S. Gopi, P. Gulhane, J. Kulkarni, J. H. Shen, M. Shokouhi, and S. Yekhanin. Differentially private set union. arXiv preprint arXiv:2002.09745, 2020. Search in Google Scholar

[15] M. Hay, V. Rastogi, G. Miklau, and D. Suciu. Boosting the accuracy of differentially-private histograms through consistency. arXiv preprint arXiv:0904.0942, 2009. Search in Google Scholar

[16] N. Holohan, D. J. Leith, and O. Mason. Differential privacy in metric spaces: Numerical, categorical and functional data under the one roof. Information Sciences, 305:256–268, 2015. Search in Google Scholar

[17] S. Inusah and T. J. Kozubowski. A discrete analogue of the laplace distribution. Journal of statistical planning and inference, 136(3):1090–1102, 2006. Search in Google Scholar

[18] N. Johnson, J. P. Near, and D. Song. Towards practical differential privacy for sql queries. Proceedings of the VLDB Endowment, 11(5):526–539, 2018. Search in Google Scholar

[19] H. Kaplan, Y. Mansour, and U. Stemmer. The sparse vector technique, revisited. arXiv preprint arXiv:2010.00917, 2020. Search in Google Scholar

[20] D. Kifer and A. Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204, 2011. Search in Google Scholar

[21] A. Korolova, K. Kenthapadi, N. Mishra, and A. Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World wide web, pages 171–180, 2009. Search in Google Scholar

[22] I. Kotsogiannis, Y. Tao, X. He, M. Fanaeepour, A. Machanavajjhala, M. Hay, and G. Miklau. Privatesql: a differentially private sql query engine. Proceedings of the VLDB Endowment, 12(11):1371–1384, 2019. Search in Google Scholar

[23] J. Lee and C. W. Clifton. Top-k frequent itemsets via differentially private fp-trees. In Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 931–940, 2014. Search in Google Scholar

[24] H. Li, L. Xiong, and X. Jiang. Differentially private synthesization of multi-dimensional data using copula functions. In Advances in database technology: proceedings. International conference on extending database technology, volume 2014, page 475. NIH Public Access, 2014. Search in Google Scholar

[25] M. Lyu, D. Su, and N. Li. Understanding the sparse vector technique for differential privacy. arXiv preprint arXiv:1603.01699, 2016. Search in Google Scholar

[26] R. J. Wilson, C. Y. Zhang, W. Lam, D. Desfontaines, D. Simmons-Marengo, and B. Gipson. Differentially private SQL with bounded user contribution. arXiv preprint arXiv:1909.01917, 2019. Search in Google Scholar

[27] X. Xiao, G. Wang, and J. Gehrke. Differential privacy via wavelet transforms. IEEE Transactions on knowledge and data engineering, 23(8):1200–1214, 2010. Search in Google Scholar

[28] Y. Xiao, L. Xiong, L. Fan, and S. Goryczka. Dpcube: differentially private histogram release through multidimensional partitioning. arXiv preprint arXiv:1202.5358, 2012. Search in Google Scholar

[29] J. Xu, Z. Zhang, X. Xiao, Y. Yang, G. Yu, and M. Winslett. Differentially private histogram publication. The VLDB Journal, 22(6):797–822, 2013. Search in Google Scholar

[30] J. Zhang, G. Cormode, C. M. Procopiuc, D. Srivastava, and X. Xiao. Privbayes: Private data release via bayesian networks. ACM Transactions on Database Systems (TODS), 42(4):1–41, 2017. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo