1. bookVolume 2022 (2022): Issue 2 (April 2022)
Journal Details
License
Format
Journal
eISSN
2299-0984
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Comprehensive Analysis of Privacy Leakage in Vertical Federated Learning During Prediction

Published Online: 03 Mar 2022
Page range: 263 - 281
Received: 31 Aug 2021
Accepted: 16 Dec 2021
Journal Details
License
Format
Journal
eISSN
2299-0984
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

Vertical federated learning (VFL), a variant of federated learning, has recently attracted increasing attention. An active party having the true labels jointly trains a model with other parties (referred to as passive parties) in order to use more features to achieve higher model accuracy. During the prediction phase, all the parties collaboratively compute the predicted confidence scores of each target record and the results will be finally returned to the active party. However, a recent study by Luo et al. [28] pointed out that the active party can use these confidence scores to reconstruct passive-party features and cause severe privacy leakage.

In this paper, we conduct a comprehensive analysis of privacy leakage in VFL frameworks during the prediction phase. Our study improves on previous work [28] regarding two aspects. We first design a general gradient-based reconstruction attack framework that can be flexibly applied to simple logistic regression models as well as multi-layer neural networks. Moreover, besides performing the attack under the white-box setting, we give the first attempt to conduct the attack under the black-box setting. Extensive experiments on a number of real-world datasets show that our proposed attack is effective under different settings and can achieve at best twice or thrice of a reduction of attack error compared to previous work [28]. We further analyze a list of potential mitigation approaches and compare their privacy-utility performances. Experimental results demonstrate that privacy leakage from the confidence scores is a substantial privacy risk in VFL frameworks during the prediction phase, which cannot be simply solved by crypto-based confidentiality approaches. On the other hand, processing the confidence scores with information compression and randomization approaches can provide strengthened privacy protection.

Keywords

[1] Martín Abadi, Andy Chu, Ian J. Goodfellow, H. Brendan McMahan, Ilya Mironov, Kunal Talwar, and Li Zhang. Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pages 308–318. ACM, 2016.10.1145/2976749.2978318 Search in Google Scholar

[2] Nick Angelou, Ayoub Benaissa, Bogdan Cebere, William Clark, Adam James Hall, Michael A. Hoeh, Daniel Liu, Pavlos Papadopoulos, Robin Roehm, Robert Sandmann, Phillipp Schoppmann, and Tom Titcombe. Asymmetric private set intersection with applications to contact tracing and private vertical federated machine learning. CoRR, abs/2011.09350, 2020. Search in Google Scholar

[3] Sean Augenstein, H. Brendan McMahan, Daniel Ramage, Swaroop Ramaswamy, Peter Kairouz, Mingqing Chen, Rajiv Mathews, and Blaise Agüera y Arcas. Generative models for effective ML on private, decentralized datasets. In 8th International Conference on Learning Representations. OpenReview.net, 2020. Search in Google Scholar

[4] Léon Bottou. Large-scale machine learning with stochastic gradient descent. In 19th International Conference on Computational Statistics, pages 177–186, 2010.10.1007/978-3-7908-2604-3_16 Search in Google Scholar

[5] Nicholas Carlini, Chang Liu, Úlfar Erlingsson, Jernej Kos, and Dawn Song. The secret sharer: Evaluating and testing unintended memorization in neural networks. In 28th USENIX Security Symposium, pages 267–284, 2019. Search in Google Scholar

[6] Tianyi Chen, Xiao Jin, Yuejiao Sun, and Wotao Yin. VAFL: A method of vertical asynchronous federated learning. CoRR, abs/2007.06081, 2020. Search in Google Scholar

[7] Kewei Cheng, Tao Fan, Yilun Jin, Yang Liu, Tianjian Chen, Dimitrios Papadopoulos, and Qiang Yang. Secureboost: A lossless federated learning framework. IEEE Intelligent Systems, 2021.10.1109/MIS.2021.3082561 Search in Google Scholar

[8] Emiliano De Cristofaro and Gene Tsudik. Practical private set intersection protocols with linear complexity. In 14th International Conference on Financial Cryptography and Data Security, volume 6052 of Lecture Notes in Computer Science, pages 143–159. Springer, 2010.10.1007/978-3-642-14577-3_13 Search in Google Scholar

[9] Ivan Damgård and Mads Jurik. A generalisation, a simplification and some applications of Paillier’s probabilistic public-key system. In Kwangjo Kim, editor, Public Key Cryptography, 4th International Workshop on Practice and Theory in Public Key Cryptography, volume 1992 of Lecture Notes in Computer Science, pages 119–136. Springer, 2001.10.1007/3-540-44586-2_9 Search in Google Scholar

[10] Dheeru Dua and Casey Graff. UCI machine learning repository, 2017. http://archive.ics.uci.edu/ml. Search in Google Scholar

[11] Cynthia Dwork and Aaron Roth. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science, 9(3-4):211–407, 2014.10.1561/0400000042 Search in Google Scholar

[12] Siwei Feng and Han Yu. Multi-participant multi-class vertical federated learning. CoRR, abs/2001.11154, 2020. Search in Google Scholar

[13] Michael J. Freedman, Kobbi Nissim, and Benny Pinkas. Efficient private matching and set intersection. In International Conference on the Theory and Applications of Cryptographic Techniques, volume 3027 of Lecture Notes in Computer Science, pages 1–19. Springer, 2004.10.1007/978-3-540-24676-3_1 Search in Google Scholar

[14] Ananda L. Freire, Guilherme A. Barreto, Marcus Veloso, and Antonio T. Varela. Short-term memory mechanisms in neural network learning of robot navigation tasks: A case study. In 6th Latin American Robotics Symposium, pages 1–6. IEEE, 2009.10.1109/LARS.2009.5418323 Search in Google Scholar

[15] Jonas Geiping, Hartmut Bauermeister, Hannah Dröge, and Michael Moeller. Inverting gradients - How easy is it to break privacy in federated learning? In Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020. Curran Associates Inc., 2020. Search in Google Scholar

[16] Andrew Hard, Kanishka Rao, Rajiv Mathews, Françoise Beaufays, Sean Augenstein, Hubert Eichner, Chloé Kiddon, and Daniel Ramage. Federated learning for mobile keyboard prediction. CoRR, abs/1811.03604, 2018. Search in Google Scholar

[17] Stephen Hardy, Wilko Henecka, Hamish Ivey-Law, Richard Nock, Giorgio Patrini, Guillaume Smith, and Brian Thorne. Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. CoRR, abs/1711.10677, 2017. Search in Google Scholar

[18] Briland Hitaj, Giuseppe Ateniese, and Fernando Pérez-Cruz. Deep models under the GAN: Information leakage from collaborative deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pages 603–618. ACM, 2017.10.1145/3133956.3134012 Search in Google Scholar

[19] Yaochen Hu, Di Niu, Jianming Yang, and Shengping Zhou. FDML: A collaborative machine learning framework for distributed features. In Ankur Teredesai, Vipin Kumar, Ying Li, Rómer Rosales, Evimaria Terzi, and George Karypis, editors, Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 2232–2240, 2019. Search in Google Scholar

[20] Yan Huang, David Evans, and Jonathan Katz. Private set intersection: Are garbled circuits better than custom protocols? In 19th Annual Network and Distributed System Security Symposium. The Internet Society, 2012. Search in Google Scholar

[21] Bargav Jayaraman and David Evans. Evaluating differentially private machine learning in practice. In 28th USENIX Security Symposium, pages 1895–1912. USENIX Association, 2019. Search in Google Scholar

[22] Xue Jiang, Xuebing Zhou, and Jens Grossklags. Privacy-preserving high-dimensional data collection with federated generative autoencoder. Proceedings on Privacy Enhancing Technologies, 2022(1):481–500, 2022.10.2478/popets-2022-0024 Search in Google Scholar

[23] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. In 3rd International Conference on Learning Representations, ICLR 2015, Conference Track Proceedings, 2015. Search in Google Scholar

[24] Wenqi Li, Fausto Milletarì, Daguang Xu, Nicola Rieke, Jonny Hancox, Wentao Zhu, Maximilian Baust, Yan Cheng, Sébastien Ourselin, M. Jorge Cardoso, and Andrew Feng. Privacy-preserving federated brain tumour segmentation. In 10th International Workshop on Machine Learning in Medical Imaging, volume 11861 of Lecture Notes in Computer Science, pages 133–141. Springer, 2019.10.1007/978-3-030-32692-0_16 Search in Google Scholar

[25] Yang Liu, Yingting Liu, Zhijie Liu, Yuxuan Liang, Chuishi Meng, Junbo Zhang, and Yu Zheng. Federated forest. IEEE Transactions on Big Data, 2020.10.1109/TBDATA.2020.2992755 Search in Google Scholar

[26] Linpeng Lu and Ning Ding. Multi-party private set intersection in vertical federated learning. In 19th IEEE International Conference on Trust, Security and Privacy in Computing and Communications, pages 707–714. IEEE, 2020. Search in Google Scholar

[27] Songtao Lu, Yawen Zhang, and Yunlong Wang. Decentralized federated learning for electronic health records. In 54th Annual Conference on Information Sciences and Systems, pages 1–5. IEEE, 2020. Search in Google Scholar

[28] Xinjian Luo, Yuncheng Wu, Xiaokui Xiao, and Beng Chin Ooi. Feature inference attack on model predictions in vertical federated learning. In 37th IEEE International Conference on Data Engineering, pages 181–192. IEEE, 2021. Search in Google Scholar

[29] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Agüera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, volume 54 of Proceedings of Machine Learning Research, pages 1273–1282, 2017. Search in Google Scholar

[30] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting unintended feature leakage in collaborative learning. In 2019 IEEE Symposium on Security and Privacy, pages 691–706. IEEE, 2019.10.1109/SP.2019.00029 Search in Google Scholar

[31] Sérgio Moro, Paulo Cortez, and Paulo Rita. A data-driven approach to predict the success of bank telemarketing. Decision Support Systems, 62:22–31, 2014.10.1016/j.dss.2014.03.001 Search in Google Scholar

[32] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Passive and active white-box inference attacks against centralized and federated learning. In 2019 IEEE Symposium on Security and Privacy, pages 739–753. IEEE, 2019.10.1109/SP.2019.00065 Search in Google Scholar

[33] Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, et al. Scikit-learn: Machine learning in Python. The Journal of Machine Learning Research, 12:2825–2830, 2011. Search in Google Scholar

[34] Le Trieu Phong, Yoshinori Aono, Takuya Hayashi, Lihua Wang, and Shiho Moriai. Privacy-preserving deep learning via additively homomorphic encryption. IEEE Transactions on Information Forensics and Security, 13(5):1333–1345, 2017.10.1109/TIFS.2017.2787987 Search in Google Scholar

[35] Swaroop Ramaswamy, Rajiv Mathews, Kanishka Rao, and Françoise Beaufays. Federated learning for emoji prediction in a mobile keyboard. CoRR, abs/1906.04329, 2019. Search in Google Scholar

[36] Daniele Romanini, Adam James Hall, Pavlos Papadopoulos, Tom Titcombe, Abbas Ismail, Tudor Cebere, Robert Sandmann, Robin Roehm, and Michael A. Hoeh. PyVertical: A vertical federated learning framework for multi-headed splitNN. CoRR, abs/2104.00489, 2021. Search in Google Scholar

[37] Stacey Truex, Ling Liu, Mehmet Emre Gursoy, Lei Yu, and Wenqi Wei. Demystifying membership inference attacks in machine learning as a service. IEEE Transactions on Services Computing, 2019. Search in Google Scholar

[38] Jaideep Vaidya and Chris Clifton. Privacy preserving association rule mining in vertically partitioned data. In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 639–644. ACM, 2002.10.1145/775047.775142 Search in Google Scholar

[39] Jaideep Vaidya and Chris Clifton. Privacy-preserving k-means clustering over vertically partitioned data. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 206–215. ACM, 2003.10.1145/956750.956776 Search in Google Scholar

[40] Jaideep Vaidya and Chris Clifton. Privacy preserving naive Bayes classifier for vertically partitioned data. In Proceedings of the 2004 SIAM International Conference on Data Mining, pages 522–526. SIAM, 2004.10.1137/1.9781611972740.59 Search in Google Scholar

[41] Jaideep Vaidya, Chris Clifton, Murat Kantarcioglu, and Scott Patterson. Privacy-preserving decision trees over vertically partitioned data. ACM Transactions on Knowledge Discovery from Data, 2(3):1–27, 2008.10.1145/1409620.1409624 Search in Google Scholar

[42] Chang Wang, Jian Liang, Mingkai Huang, Bing Bai, Kun Bai, and Hao Li. Hybrid differentially private federated learning on vertically partitioned data. CoRR, abs/2009.02763, 2020. Search in Google Scholar

[43] Qian Wang, Minxin Du, Xiuying Chen, Yanjiao Chen, Pan Zhou, Xiaofeng Chen, and Xinyi Huang. Privacy-preserving collaborative model learning: The case of word vector training. IEEE Transactions on Knowledge and Data Engineering, 30(12):2381–2393, 2018. Search in Google Scholar

[44] Yichuan Wang, Yuying Tian, Xinyue Yin, and Xinhong Hei. A trusted recommendation scheme for privacy protection based on federated learning. CCF Transactions on Networking, 3(3-4):218–228, 2020.10.1007/s42045-020-00045-8 Search in Google Scholar

[45] Haiqin Weng, Juntao Zhang, Feng Xue, Tao Wei, Shouling Ji, and Zhiyuan Zong. Privacy leakage of real-world vertical federated learning. CoRR, abs/2011.09290, 2020. Search in Google Scholar

[46] Yuncheng Wu, Shaofeng Cai, Xiaokui Xiao, Gang Chen, and Beng Chin Ooi. Privacy preserving vertical federated learning for tree-based models. Proceedings of the VLDB Endowment, 13(11):2090–2103, 2020.10.14778/3407790.3407811 Search in Google Scholar

[47] Bangzhou Xin, Wei Yang, Yangyang Geng, Sheng Chen, Shaowei Wang, and Liusheng Huang. Private fl-gan: Differential privacy synthetic data generation based on federated learning. In 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 2927–2931. IEEE, 2020. Search in Google Scholar

[48] Kai Yang, Tao Fan, Tianjian Chen, Yuanming Shi, and Qiang Yang. A quasi-Newton method based vertical federated learning framework for logistic regression. CoRR, abs/1912.00513, 2019. Search in Google Scholar

[49] Liu Yang, Ben Tan, Vincent W. Zheng, Kai Chen, and Qiang Yang. Federated recommendation systems. In Federated Learning - Privacy and Incentive, volume 12500 of Lecture Notes in Computer Science, pages 225–239. Springer, 2020.10.1007/978-3-030-63076-8_16 Search in Google Scholar

[50] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology, 10(2):12:1–12:19, 2019. Search in Google Scholar

[51] Shengwen Yang, Bing Ren, Xuhui Zhou, and Liping Liu. Parallel distributed logistic regression for vertical federated learning without third-party coordinator. CoRR, abs/1911.09824, 2019. Search in Google Scholar

[52] Ziqi Yang, Bin Shao, Bohan Xuan, Ee-Chien Chang, and Fan Zhang. Defending model inversion and membership inference attacks via prediction purification. CoRR, abs/2005.03915, 2020. Search in Google Scholar

[53] Andrew Chi-Chih Yao. Protocols for secure computations (extended abstract). In 23rd Annual Symposium on Foundations of Computer Science, pages 160–164. IEEE Computer Society, 1982. Search in Google Scholar

[54] Hwanjo Yu, Jaideep Vaidya, and Xiaoqian Jiang. Privacy-preserving SVM classification on vertically partitioned data. In 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, volume 3918 of Lecture Notes in Computer Science, pages 647–656. Springer, 2006.10.1007/11731139_74 Search in Google Scholar

[55] Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. iDLG: Improved deep leakage from gradients. CoRR, abs/2001.02610, 2020. Search in Google Scholar

[56] Ligeng Zhu, Zhijian Liu, and Song Han. Deep leakage from gradients. In Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems, pages 14747–14756, 2019. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo