1. bookVolume 2020 (2020): Issue 2 (April 2020)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Differentially Private SQL with Bounded User Contribution

Published Online: 08 May 2020
Page range: 230 - 250
Received: 31 Aug 2019
Accepted: 16 Dec 2019
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English

Differential privacy (DP) provides formal guarantees that the output of a database query does not reveal too much information about any individual present in the database. While many differentially private algorithms have been proposed in the scientific literature, there are only a few end-to-end implementations of differentially private query engines. Crucially, existing systems assume that each individual is associated with at most one database record, which is unrealistic in practice. We propose a generic and scalable method to perform differentially private aggregations on databases, even when individuals can each be associated with arbitrarily many rows. We express this method as an operator in relational algebra, and implement it in an SQL engine. To validate this system, we test the utility of typical queries on industry benchmarks, and verify its correctness with a stochastic test framework we developed. We highlight the promises and pitfalls learned when deploying such a system in practice, and we publish its core components as open-source software.

Keywords

[1] Kareem Amin, Alex Kulesza, Andres Munoz, and Sergei Vassilvtiskii. Bounding user contributions: A bias-variance trade-off in differential privacy. In Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pages 263–271, 2019.Search in Google Scholar

[2] Johes Bater, Xi He, William Ehrich, Ashwin Machanavajjhala, and Jennie Rogers. Shrinkwrap: Differentially-private query processing in private data federations. arXiv preprint arXiv:1810.01816, 2018.Search in Google Scholar

[3] Michael Ben-Or and Avinatan Hassidim. The Bayesian learner is optimal for noisy binary search (and pretty good for quantum as well). In 2008 49th Annual IEEE Symposium on Foundations of Computer Science, pages 221–230. IEEE, 2008.Search in Google Scholar

[4] Benjamin Bichsel, Timon Gehr, Dana Drachsler-Cohen, Petar Tsankov, and Martin Vechev. DP-finder: Finding differential privacy violations by sampling and optimization. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 508–524. ACM, 2018.Search in Google Scholar

[5] Vincent Bindschaedler, Reza Shokri, and Carl A Gunter. Plausible deniability for privacy-preserving data synthesis. Proceedings of the VLDB Endowment, 10(5):481–492, 2017.Search in Google Scholar

[6] Mark Bun and Thomas Steinke. Concentrated differential privacy: Simplifications, extensions, and lower bounds. In Theory of Cryptography Conference, pages 635–658. Springer, 2016.Search in Google Scholar

[7] Transaction Processing Performance Council. TPC-H benchmark specification. http://www.tpc.org/tpch/, 2008.Search in Google Scholar

[8] Damien Desfontaines and Balázs Pejó. Sok: Differential privacies. arXiv preprint arXiv:1906.01337, 2019.Search in Google Scholar

[9] Zeyu Ding, Yuxin Wang, Guanhong Wang, Danfeng Zhang, and Daniel Kifer. Detecting violations of differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, CCS ’18, pages 475–489, New York, NY, USA, 2018. ACM.Search in Google Scholar

[10] Cynthia Dwork. An ad omnia approach to defining and achieving private data analysis. In International Workshop on Privacy, Security, and Trust in KDD, pages 1–13. Springer, 2007.Search in Google Scholar

[11] Cynthia Dwork. The differential privacy frontier. In Theory of Cryptography Conference, pages 496–502. Springer, 2009.Search in Google Scholar

[12] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference, pages 265–284. Springer, 2006.Search in Google Scholar

[13] Cynthia Dwork, Moni Naor, Toniann Pitassi, Guy N Roth-blum, and Sergey Yekhanin. Pan-private streaming algorithms. In ICS, pages 66–80, 2010.Search in Google Scholar

[14] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3–4):211–407, 2014.Search in Google Scholar

[15] Paul Francis, Sebastian Probst Eide, and Reinhard Munz. Diffix: High-utility database anonymization. In Annual Privacy Forum, pages 141–158. Springer, 2017.Search in Google Scholar

[16] Quan Geng and Pramod Viswanath. The optimal mechanism in differential privacy. arXiv preprint arXiv:1212.1186, 2012.Search in Google Scholar

[17] Michaela Gotz, Ashwin Machanavajjhala, Guozhang Wang, Xiaokui Xiao, and Johannes Gehrke. Publishing search logs—a comparative study of privacy guarantees. IEEE Transactions on Knowledge and Data Engineering, 24(3):520–532, 2011.Search in Google Scholar

[18] J. H. Halton. Algorithm 247: Radical-inverse quasi-random point sequence. Commun. ACM, 7(12):701–702, December 1964.Search in Google Scholar

[19] Justin Hsu, Marco Gaboardi, Andreas Haeberlen, Sanjeev Khanna, Arjun Narayan, Benjamin C Pierce, and Aaron Roth. Differential privacy: An economic method for choosing epsilon. In 2014 IEEE 27th Computer Security Foundations Symposium, pages 398–410. IEEE, 2014.Search in Google Scholar

[20] Noah Johnson and Joseph P Near. Dataflow analysis & differential privacy for SQL queries. https://github.com/uber/sql-differential-privacy. Accessed: 2019-09-04.Search in Google Scholar

[21] Noah Johnson, Joseph P Near, and Dawn Song. Towards practical differential privacy for SQL queries. Proceedings of the VLDB Endowment, 11(5):526–539, 2018.Search in Google Scholar

[22] Peter Kairouz, Sewoong Oh, and Pramod Viswanath. The composition theorem for differential privacy. IEEE Transactions on Information Theory, 63(6):4037–4049, 2017.Search in Google Scholar

[23] Richard M Karp and Robert Kleinberg. Noisy binary search and its applications. In Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, pages 881–890. Society for Industrial and Applied Mathematics, 2007.Search in Google Scholar

[24] Daniel Kifer and Ashwin Machanavajjhala. No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data, pages 193–204. ACM, 2011.Search in Google Scholar

[25] Aleksandra Korolova, Krishnaram Kenthapadi, Nina Mishra, and Alexandros Ntoulas. Releasing search queries and clicks privately. In Proceedings of the 18th international conference on World wide web, pages 171–180. ACM, 2009.Search in Google Scholar

[26] Ios Kotsogiannis, Yuchao Tao, Xi He, Maryam Fanaeepour, Ashwin Machanavajjhala, Michael Hay, and Gerome Miklau. Privatesql: a differentially private sql query engine. Proceedings of the VLDB Endowment, 12(11):1371–1384, 2019.Search in Google Scholar

[27] Ios Kotsogiannis, Yuchao Tao, Ashwin Machanavajjhala, Gerome Miklau, and Michael Hay. Architecting a differentially private SQL engine. In Conference on Innovative Data Systems Research, 2019.Search in Google Scholar

[28] Sara Krehbiel. Choosing epsilon for privacy as a service. Proceedings on Privacy Enhancing Technologies, 2019(1):192–205, 2019.Search in Google Scholar

[29] Jaewoo Lee and Chris Clifton. How much is enough? choosing ɛ for differential privacy. In International Conference on Information Security, pages 325–340. Springer, 2011.Search in Google Scholar

[30] Chao Li, Michael Hay, Gerome Miklau, and Yue Wang. A data-and workload-aware algorithm for range queries under differential privacy. Proceedings of the VLDB Endowment, 7(5):341–352, 2014.Search in Google Scholar

[31] Ninghui Li, Min Lyu, Dong Su, and Weining Yang. Differential privacy: From theory to practice. Synthesis Lectures on Information Security, Privacy, & Trust, 8(4):1–138, 2016.Search in Google Scholar

[32] Frank D McSherry. Synthethic data via differential privacy. https://github.com/frankmcsherry/blog/blob/master/assets/Synth-SIGMOD.pdf. Accessed: 2019-05-28.Search in Google Scholar

[33] Frank D McSherry. Uber’s differential privacy.. probably isn’t. https://github.com/frankmcsherry/blog/blob/master/posts/2018-02-25.md. Accessed: 2019-03-22.Search in Google Scholar

[34] Frank D McSherry. Privacy integrated queries: an extensible platform for privacy-preserving data analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, pages 19–30. ACM, 2009.Search in Google Scholar

[35] Sebastian Meiser and Esfandiar Mohammadi. Tight on budget?: Tight bounds for r-fold approximate differential privacy. In Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security, pages 247–264. ACM, 2018.Search in Google Scholar

[36] Ilya Mironov. On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM conference on Computer and communications security, pages 650–661. ACM, 2012.Search in Google Scholar

[37] Ilya Mironov. Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF), pages 263–275. IEEE, 2017.Search in Google Scholar

[38] Maurizio Naldi and Giuseppe D’Acquisto. Differential privacy: an estimation theory-based method for choosing epsilon. arXiv preprint arXiv:1510.00917, 2015.Search in Google Scholar

[39] Arjun Narayan and Andreas Haeberlen. DJoin: differentially private join queries over distributed databases. In Presented as part of the 10th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 12), pages 149–162, 2012.Search in Google Scholar

[40] Kobbi Nissim, Sofya Raskhodnikova, and Adam Smith. Smooth sensitivity and sampling in private data analysis. In Proceedings of the thirty-ninth annual ACM symposium on Theory of computing, pages 75–84. ACM, 2007.Search in Google Scholar

[41] Kobbi Nissim, Thomas Steinke, Alexandra Wood, Micah Altman, Aaron Bembenek, Mark Bun, Marco Gaboardi, David R O’Brien, and Salil Vadhan. Differential privacy: A primer for a non-technical audience. In Privacy Law Scholars Conf, 2017.Search in Google Scholar

[42] Larry Wasserman. All of statistics: a concise course in statistical inference. Springer Science & Business Media, 2013.Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo