1. bookVolume 2022 (2022): Edizione 2 (April 2022)
Dettagli della rivista
License
Formato
Rivista
eISSN
2299-0984
Prima pubblicazione
16 Apr 2015
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
access type Accesso libero

d3p - A Python Package for Differentially-Private Probabilistic Programming

Pubblicato online: 03 Mar 2022
Volume & Edizione: Volume 2022 (2022) - Edizione 2 (April 2022)
Pagine: 407 - 425
Ricevuto: 31 Aug 2021
Accettato: 16 Dec 2021
Dettagli della rivista
License
Formato
Rivista
eISSN
2299-0984
Prima pubblicazione
16 Apr 2015
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
Abstract

We present d3p, a software package designed to help fielding runtime efficient widely-applicable Bayesian inference under differential privacy guarantees. d3p achieves general applicability to a wide range of probabilistic modelling problems by implementing the differentially private variational inference algorithm, allowing users to fit any parametric probabilistic model with a differentiable density function. d3p adopts the probabilistic programming paradigm as a powerful way for the user to flexibly define such models. We demonstrate the use of our software on a hierarchical logistic regression example, showing the expressiveness of the modelling approach as well as the ease of running the parameter inference. We also perform an empirical evaluation of the runtime of the private inference on a complex model and find a ~10 fold speed-up compared to an implementation using TensorFlow Privacy.

Keywords

[1] Martín Abadi et al. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensor-flow.org. Search in Google Scholar

[2] Martin Abadi et al. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC conference on computer and communications security, pages 308–318, 2016.10.1145/2976749.2978318 Search in Google Scholar

[3] Eli Bingham et al. Pyro: Deep Universal Probabilistic Programming. arXiv preprint arXiv:1810.09538, 2018. Search in Google Scholar

[4] James Bradbury, Roy Frostig, Peter Hawkins, Matthew James Johnson, Chris Leary, Dougal Maclaurin, and Skye Wanderman-Milne. JAX: composable transformations of Python+NumPy programs. https://github.com/google/jax, 2018. Search in Google Scholar

[5] Clément L Canonne, Gautam Kamath, and Thomas Steinke. The discrete gaussian for differential privacy. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, editors, Advances in Neural Information Processing Systems, volume 33, pages 15676–15688. Curran Associates, Inc., 2020. Search in Google Scholar

[6] Bob Carpenter et al. Stan: a probabilistic programming language. Journal of Statistical Software, 76(1), 2017.10.18637/jss.v076.i01 Search in Google Scholar

[7] Joshua V. Dillon et al. Tensorflow distributions. arXiv preprint arXiv:1711.10604, 2017. Search in Google Scholar

[8] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.10.1561/0400000042 Search in Google Scholar

[9] Cynthia Dwork, Krishnaram Kenthapadi, Frank McSherry, Ilya Mironov, and Moni Naor. Our data, ourselves: Privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pages 486–503. Springer, 2006.10.1007/11761679_29 Search in Google Scholar

[10] Cynthia Dwork, Frank McSherry, Kobbi Nissim, and Adam Smith. Calibrating noise to sensitivity in private data analysis. In Theory of cryptography conference, pages 265–284. Springer, 2006.10.1007/11681878_14 Search in Google Scholar

[11] Úlfar Erlingsson, Ilya Mironov, Ananth Raghunathan, and Shuang Song. That which we call private. arXiv preprint arXiv:1908.03566, 2019. Search in Google Scholar

[12] Facebook. Opacus. https://opacus.ai/, 2020. Search in Google Scholar

[13] Horst Feistel. Cryptography and computer privacy. Scientific american, 228(5):15–23, 1973.10.1038/scientificamerican0573-15 Search in Google Scholar

[14] Chris Fonnesbeck, Anand Patil, David Huard, and John Salvatier. PyMC: Bayesian stochastic modelling in python. Astrophysics Source Code Library, 2015. Search in Google Scholar

[15] Roy Frostig, Matthew James Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 2018. Search in Google Scholar

[16] Simson L. Garfinkel and Philip Leclerc. Randomness concerns when deploying differential privacy. In Proceedings of the 19th Workshop on Privacy in the Electronic Society, WPES’20, page 73–86, New York, NY, USA, 2020. Association for Computing Machinery. ISBN 9781450380867. 10.1145/3411497.3420211.10.1145/3411497.3420211 Search in Google Scholar

[17] Charles R. Harris et al. Array programming with NumPy. Nature, 585(7825):357–362, September 2020. 10.1038/s41586-020-2649-2.10.1038/s41586-020-2649-2775946132939066 Search in Google Scholar

[18] Joonas Jälkö, Onur Dikmen, and Antti Honkela. Differentially private variational inference for non-conjugate models. In Uncertainty in Artificial Intelligence 2017 Proceedings of the 33rd Conference, UAI 2017. The Association for Uncertainty in Artificial Intelligence, 2017. Search in Google Scholar

[19] Joonas Jälkö, Eemil Lagerspetz, Jari Haukka, Sasu Tarkoma, Antti Honkela, and Samuel Kaski. Privacy-preserving data sharing via probabilistic modeling. Patterns, 2(7):100271, 2021. ISSN 2666-3899. 10.1016/j.patter.2021.100271.10.1016/j.patter.2021.100271827601534286296 Search in Google Scholar

[20] Michael I. Jordan, Zoubin Ghahramani, Tommi S. Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. Machine learning, 37(2):183–233, 1999.10.1023/A:1007665907178 Search in Google Scholar

[21] Peter Kairouz, Ziyu Liu, and Thomas Steinke. The distributed discrete gaussian mechanism for federated learning with secure aggregation. In Marina Meila and Tong Zhang, editors, Proceedings of the 38th International Conference on Machine Learning, volume 139 of Proceedings of Machine Learning Research, pages 5201–5212. PMLR, 18–24 Jul 2021. Search in Google Scholar

[22] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In International Conference on Learning Representations (ICLR 2015), 2015. Search in Google Scholar

[23] Diederik P. Kingma and Max Welling. Auto-encoding variational Bayes. In 2nd International Conference on Learning Representations (ICLR 2014), 2014. Search in Google Scholar

[24] Antti Koskela, Joonas Jälkö, and Antti Honkela. Computing tight differential privacy guarantees using FFT. In International Conference on Artificial Intelligence and Statistics, pages 2560–2569. PMLR, 2020. Search in Google Scholar

[25] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009. Search in Google Scholar

[26] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.10.1109/5.726791 Search in Google Scholar

[27] Michael Luby and Charles Rackoff. How to construct pseudorandom permutations from pseudorandom functions. SIAM Journal on Computing, 17(2):373–386, 1988.10.1137/0217022 Search in Google Scholar

[28] Ilya Mironov, Omkant Pandey, Omer Reingold, and Salil Vadhan. Computational differential privacy. In Annual International Cryptology Conference, pages 126–142. Springer, 2009.10.1007/978-3-642-03356-8_8 Search in Google Scholar

[29] Rory Mitchell, Daniel Stokes, Eibe Frank, and Geoffrey Holmes. Bandwidth-optimal random shuffling for GPUs. arXiv preprint arXiv:2106.06161, abs/2106.06161, 2021. Search in Google Scholar

[30] Adam Paszke et al. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, pages 8024–8035. Curran Associates, Inc., 2019. Search in Google Scholar

[31] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011. Search in Google Scholar

[32] Du Phan, Neeraj Pradhan, and Martin Jankowiak. Composable effects for flexible and accelerated probabilistic programming in NumPyro. arXiv preprint arXiv:1912.11554, 2019. Search in Google Scholar

[33] Carey Radebaugh and Ulfar Erlingsson. Introducing Tensor-Flow privacy: Learning with differential privacy for training data. TensorFlow Blog, https://blog.tensorflow.org/2019/03/introducing-tensorflow-privacy-learning.html, 2019. Search in Google Scholar

[34] Shuang Song, Kamalika Chaudhuri, and Anand D. Sarwate. Stochastic gradient descent with differentially private updates. In 2013 IEEE Global Conference on Signal and Information Processing, pages 245–248. IEEE, 2013.10.1109/GlobalSIP.2013.6736861 Search in Google Scholar

[35] Daniel Stokes and Rory Mitchell. CUDA-Shuffle: GPU shuffle using bijective functions. https://github.com/djns99/CUDA-Shuffle, 2021. Search in Google Scholar

[36] Pranav Subramani, Nicholas Vadivelu, and Gautam Kamath. Enabling fast differentially private SGD via just-in-time compilation and vectorization. arXiv preprint arXiv:2010.09063, 2020. Search in Google Scholar

[37] Theano Development Team. Theano: A python framework for fast computation of mathematical expressions. arXiv preprint arXiv:1605.02688, 2016. Search in Google Scholar

[38] Michalis Titsias and Miguel Lázaro-Gredilla. Doubly stochastic variational Bayes for non-conjugate inference. In International conference on machine learning, pages 1971–1979, 2014. Search in Google Scholar

[39] Dustin Tran et al. Simple, distributed, and accelerated probabilistic programming. In Neural Information Processing Systems, 2018. Search in Google Scholar

[40] Martin J. Wainwright and Michael Irwin Jordan. Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.10.1561/9781601981851 Search in Google Scholar

[41] Chris Waites. PyVacy. https://github.com/ChrisWaites/pyvacy, 2019. Search in Google Scholar

[42] George Y. Wong and William M. Mason. The hierarchical logistic regression model for multilevel analysis. Journal of the American Statistical Association, 80(391):513–524, 1985. ISSN 01621459.10.1080/01621459.1985.10478148 Search in Google Scholar

[43] Han Xiao, Kashif Rasul, and Roland Vollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747, 2017. Search in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo