1. bookVolume 2021 (2021): Issue 4 (October 2021)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

Less is More: A privacy-respecting Android malware classifier using federated learning

Published Online: 23 Jul 2021
Page range: 96 - 116
Received: 28 Feb 2021
Accepted: 16 Jun 2021
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

In this paper we present LiM (‘Less is More’), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users’ devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in > 100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with Ma-MaDroid’s dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.

Keywords

[1] Akshay Agrawal, Robin Verschueren, Steven Diamond, and Stephen Boyd. A rewriting system for convex optimization problems. Journal of Control and Decision, 5(1):42–60, January 2018. Search in Google Scholar

[2] A. Albaseer, B. S. Ciftler, M. Abdallah, and A. Al-Fuqaha. Exploiting Unlabeled Data in Smart Cities using Federated Edge Learning. In 2020 International Wireless Communications and Mobile Computing (IWCMC), pages 1666–1671. Search in Google Scholar

[3] Kevin Allix, Tegawendé F. Bissyandé, Jacques Klein, and Yves Le Traon. AndroZoo: Collecting Millions of Android Apps for the Research Community. In Proceedings of the 13th International Conference on Mining Software Repositories, MSR ’16, pages 468–471, New York, NY, USA, 2016. ACM. Search in Google Scholar

[4] Daniel Arp, Michael Spreitzenbarth, Malte Hübner, Hugo Gascon, and Konrad Rieck. Drebin: Effective and Explainable Detection of Android Malware in Your Pocket. In Proceedings 2014 Network and Distributed System Security Symposium, San Diego, CA, 2014. Internet Society. Search in Google Scholar

[5] Saba Arshad, Munam A Shah, Abdul Wahid, Amjad Mehmood, Houbing Song, and Hongnian Yu. Samadroid: a novel 3-level hybrid malware detection model for android operating system. IEEE Access, 6:4321–4339, 2018. Search in Google Scholar

[6] Eugene Bagdasaryan, Andreas Veit, Yiqing Hua, Deborah Estrin, and Vitaly Shmatikov. How To Backdoor Federated Learning. arXiv:1807.00459 [cs], July 2018. Search in Google Scholar

[7] Android Developers. https://developer.android.com/guide/topics/permissions/overview – accessed on 28 June 2019. Search in Google Scholar

[8] Steven Diamond and Stephen Boyd. CVXPY: A Python-Embedded Modeling Language for Convex Optimization. Journal of Machine Learning Research, 17(83):1–5, 2016. Search in Google Scholar

[9] Google. Application fundamentals. https://developer.android.com/guide/components/fundamentals. Search in Google Scholar

[10] S. Hutchinson, B. Zhou, and U. Karabiyik. Are We Really Protected? An Investigation into the Play Protect Service. In 2019 IEEE International Conference on Big Data (Big Data), pages 4997–5004. Search in Google Scholar

[11] Peter Kairouz, H. Brendan McMahan, Brendan Avent, Aurélien Bellet, and Mehdi Bennis et al. Advances and Open Problems in Federated Learning. arXiv:1912.04977 [cs, stat], December 2019. Search in Google Scholar

[12] TaeGuen Kim, BooJoong Kang, Mina Rho, Sakir Sezer, and Eul Gyu Im. A multimodal deep learning method for android malware detection using various features. IEEE Transactions on Information Forensics and Security, 14(3):773–788, 2018. Search in Google Scholar

[13] Jakub Kone£n`y, Brendan McMahan, and Daniel Ramage. Federated optimization: Distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575, 2015. Search in Google Scholar

[14] Jin Li, Lichao Sun, Qiben Yan, Zhiqiang Li, Witawas Srisaan, and Heng Ye. Significant permission identification for machine-learning-based android malware detection. IEEE Transactions on Industrial Informatics, 14(7):3216–3225, 2018. Search in Google Scholar

[15] Yu-Feng Li, Lan-Zhe Guo, and Zhi-Hua Zhou. Towards Safe Weakly Supervised Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 1–1, 2019. Search in Google Scholar

[16] Yu-Feng Li and Zhi-Hua Zhou. Towards making unlabeled data never hurt. IEEE transactions on pattern analysis and machine intelligence, 37(1):175–188, 2014. Search in Google Scholar

[17] Yuping Li, Jiyong Jang, Xin Hu, and Xinming Ou. Android malware clustering through malicious payload mining. In International Symposium on Research in Attacks, Intrusions, and Defenses, pages 192–214. Springer, 2017. Search in Google Scholar

[18] Luca Melis, Congzheng Song, Emiliano De Cristofaro, and Vitaly Shmatikov. Exploiting Unintended Feature Leakage in Collaborative Learning. In 2019 IEEE Symposium on Security and Privacy (SP), volume 1, San Francisco, CA, US, May 2019. Search in Google Scholar

[19] Nikola Milosevic, Ali Dehghantanha, and Kim-Kwang Raymond Choo. Machine learning aided Android malware classification. Computers & Electrical Engineering, 61:266–274, July 2017. Search in Google Scholar

[20] Omid Mirzaei, Guillermo Suarez-Tangil, Jose M de Fuentes, Juan Tapiador, and Gianluca Stringhini. Andrensemble: Leveraging api ensembles to characterize android malware families. Proceedings of 14th ACM ASIA Conference on Computer and Communications Security (ACM ASIACCS 2019), 2019. Search in Google Scholar

[21] Veelasha Moonsamy, Jia Rong, and Shaowu Liu. Mining permission patterns for contrasting clean and malicious android applications. Future Generation Computer Systems, 36:122–132, July 2014. Search in Google Scholar

[22] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive Privacy Analysis of Deep Learning: Passive and Active White-box Inference Attacks against Centralized and Federated Learning. In 2019 IEEE Symposium on Security and Privacy (SP), volume 1, pages 1021–1035, San Francisco, CA, US, May 2019. Search in Google Scholar

[23] Lucky Onwuzurike, Enrico Mariconti, Panagiotis Andriotis, Emiliano De Cristofaro, Gordon Ross, and Gianluca Stringhini. Mamadroid: Detecting android malware by building markov chains of behavioral models (extended version). ACM Transactions on Privacy and Security (TOPS), 22(2):1–34, 2019. Search in Google Scholar

[24] OWASP. OWASP SeraphimDroid Project - OWASP. https://www.owasp.org/index.php/OWASP_SeraphimDroid_Project. Search in Google Scholar

[25] Andrea Saracino, Daniele Sgandurra, Gianluca Dini, and Fabio Martinelli. MADAM: Effective and Efficient Behavior-based Android Malware Detection and Prevention. IEEE Transactions on Dependable and Secure Computing, 15(1):83–97, January 2018. Search in Google Scholar

[26] Virat Shejwalkar and Amir Houmansadr. Manipulating the Byzantine: Optimizing Model Poisoning Attacks and Defenses for Federated Learning. page 18. Internet Society. Search in Google Scholar

[27] Congzheng Song, Thomas Ristenpart, and Vitaly Shmatikov. Machine learning models that remember too much. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security (CCS 2017), pages 587–601, 2017. Search in Google Scholar

[28] Fengguo Wei, Yuping Li, Sankardas Roy, Xinming Ou, and Wu Zhou. Deep ground truth analysis of current android malware. In International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment, pages 252–276. Springer, 2017. Search in Google Scholar

[29] K. Wei, J. Li, M. Ding, C. Ma, H. H. Yang, F. Farokhi, S. Jin, T. Q. S. Quek, and H. Vincent Poor. Federated Learning With Differential Privacy: Algorithms and Performance Analysis. 15:3454–3469. Search in Google Scholar

[30] Qiang Yang, Yang Liu, Tianjian Chen, and Yongxin Tong. Federated machine learning: Concept and applications. ACM Transactions on Intelligent Systems and Technology (TIST), 10(2):1–19, 2019. Search in Google Scholar

[31] Xin Yao, Tianchi Huang, Chenglei Wu, Ruixiao Zhang, and Lifeng Sun. Towards faster and better federated learning: A feature fusion approach. In 2019 IEEE International Conference on Image Processing (ICIP), pages 175–179, 2019. Search in Google Scholar

[32] Hanlin Zhang, Yevgeniy Cole, Linqiang Ge, Sixiao Wei, Wei Yu, Chao Lu, Genshe Chen, Dan Shen, Erik Blasch, and Khanh D. Pham. Scanme mobile: A cloud-based android malware analysis service. SIGAPP Appl. Comput. Rev., 16(1):36–49, April 2016. Search in Google Scholar

[33] Yajin Zhou and Xuxian Jiang. Dissecting android malware: Characterization and evolution. In 2012 IEEE symposium on security and privacy, pages 95–109. IEEE, 2012. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo