1. bookVolume 2019 (2019): Issue 3 (July 2019)
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year
access type Open Access

MAPS: Scaling Privacy Compliance Analysis to a Million Apps

Published Online: 12 Jul 2019
Page range: 66 - 86
Journal Details
First Published
16 Apr 2015
Publication timeframe
4 times per year

The app economy is largely reliant on data collection as its primary revenue model. To comply with legal requirements, app developers are often obligated to notify users of their privacy practices in privacy policies. However, prior research has suggested that many developers are not accurately disclosing their apps’ privacy practices. Evaluating discrepancies between apps’ code and privacy policies enables the identification of potential compliance issues. In this study, we introduce the Mobile App Privacy System (MAPS) for conducting an extensive privacy census of Android apps. We designed a pipeline for retrieving and analyzing large app populations based on code analysis and machine learning techniques. In its first application, we conduct a privacy evaluation for a set of 1,035,853 Android apps from the Google Play Store. We find broad evidence of potential non-compliance. Many apps do not have a privacy policy to begin with. Policies that do exist are often silent on the practices performed by apps. For example, 12.1% of apps have at least one location-related potential compliance issue. We hope that our extensive analysis will motivate app stores, government regulators, and app developers to more effectively review apps for potential compliance issues.

[1] V. Afonso, A. Bianchi, Y. Fratantonio, A. Doupe, M. Polino, P. de Geus, C. Kruegel, and G. Vigna, “Going native: Using a large-scale analysis of android apps to create a practical native-code sandboxing policy,” in NDSS ’16, Feb. 2016.10.14722/ndss.2016.23384Search in Google Scholar

[2] S. Arzt, S. Rasthofer, C. Fritz, E. Bodden, A. Bartel, J. Klein, Y. Le Traon, D. Octeau, and P. McDaniel, “Flow-Droid: Precise context, flow, field, object-sensitive and lifecycle-aware taint analysis for android apps,” SIGPLAN Not., vol. 49, no. 6, pp. 259–269, Jun. 2014.10.1145/2666356.2594299Search in Google Scholar

[3] R. Balebako, A. Marsh, J. Lin, J. Hong, and L. F. Cranor, “The privacy and security behaviors of smartphone app developers,” in USEC ’14, 2014.10.14722/usec.2014.23006Search in Google Scholar

[4] S. Bird, E. Klein, and E. Loper, “Natural language processing with python,” 2014, accessed: June 28, 2019. [Online]. Available: http://www.nltk.org/book/ch11.htmlSearch in Google Scholar

[5] J. Bowers, B. Reaves, I. N. Sherman, P. Traynor, and K. R. B. Butler, “Regulators, mount up! Analysis of privacy policies for mobile money services,” in SOUPS ’17, 2017.Search in Google Scholar

[6] California Department of Justice, “Attorney General Kamala D. Harris secures global agreement to strengthen privacy protections for users of mobile applications,” http://www.oag.ca.gov/news/press-releases/attorney-general-kamala-d-harris-secures-global-agreement-strengthen-privacy, Feb. 2012, accessed: June 28, 2019.Search in Google Scholar

[7] Y. Chen, W. You, Y. Lee, K. Chen, X. Wang, and W. Zou, “Mass discovery of android traffic imprints through instantiated partial execution,” in CCS ’17, 2017.10.1145/3133956.3134009Search in Google Scholar

[8] B. Clark. (2017, Feb.) Millions of apps could soon be purged from Google Play Store. https://thenextweb.com/google/2017/02/08/millions-apps-soon-purged-google-play-store/.Search in Google Scholar

[9] A. Continella, Y. Fratantonio, M. Lindorfer, A. Puccetti, A. Zand, C. Kruegel, and G. Vigna, “Obfuscation-resilient privacy leak detection for mobile apps through differential analysis,” in NDSS ’17, 2017.10.14722/ndss.2017.23465Search in Google Scholar

[10] L. F. Cranor, P. G. Leon, and B. Ur, “A large-scale evaluation of U.S. financial institutions standardized privacy notices,” ACM Trans. Web, vol. 10, no. 3, pp. 17:1–17:33, Aug. 2016.10.1145/2911988Search in Google Scholar

[11] Don Reisinger, “Google Play gets serious with ’expert’ screening, age ratings for Android apps,” https://www.cnet.com/news/google-play-adds-app-ratings-to-inform-users-on-content/, Mar. 2015, accessed: June 28, 2019.Search in Google Scholar

[12] B. Efron, “Bootstrap methods: Another look at the jackknife,” in Breakthroughs in statistics. Springer, 1992, pp. 569–593.10.1007/978-1-4612-4380-9_41Search in Google Scholar

[13] W. Enck, P. Gilbert, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An information-flow tracking system for realtime privacy monitoring on smartphones,” in OSDI ’10, 2010.Search in Google Scholar

[14] T. Ermakova, B. Fabian, and E. Babina, “Readability of privacy policies of healthcare websites,” in Wirtschaftsinformatik ’15, 2015.Search in Google Scholar

[15] ESRB, “ESRB ratings guide,” http://www.esrb.org/ratings/ratings_guide.aspx, 2018, accessed: June 28, 2019.Search in Google Scholar

[16] FTC, “Complaint Path,” https://www.ftc.gov/sites/default/files/documents/cases/2013/02/130201pathinccmpt.pdf, Feb. 2013, accessed: June 28, 2019.Search in Google Scholar

[17] C. Gibler, J. Crussell, J. Erickson, and H. Chen, “AndroidLeaks: Automatically detecting potential privacy leaks in android applications on a large scale,” in TRUST ’12, 2012.10.1007/978-3-642-30921-2_17Search in Google Scholar

[18] Google, “Designed for families addendum,” https://play.google.com/intl/ALL_us/about/families/developer-distribution-agreement-addendum.html, 2015, accessed: June 28, 2019.Search in Google Scholar

[19] Google, “Google analytics terms of service,” https://www.google.com/analytics/terms/us.html, 2018, accessed: June 28, 2019.Search in Google Scholar

[20] ——, “Google developer policy center user data,” https://play.google.com/about/privacy-security-deception/user-data/, 2018, accessed: June 28, 2019.Search in Google Scholar

[21] Google, “Play console help,” https://support.google.com/googleplay/android-developer/answer/6048248?hl=en, 2018, accessed: June 28, 2019.Search in Google Scholar

[22] M. I. Gordon, D. Kim, J. Perkins, L. Gilham, N. Nguyen, and M. Rinard, “Information-flow analysis of android applications in DroidSafe,” in NDSS ’15, 2015.10.14722/ndss.2015.23089Search in Google Scholar

[23] H. Harkous, K. Fawaz, R. Lebret, F. Schaub, K. G. Shin, and K. Aberer, “Polisis: Automated analysis and presentation of privacy policies using deep learning,” in USENIX Security ’18, 2018.Search in Google Scholar

[24] J. Huang, O. Schranz, S. Bugiel, and M. Backes, “The art of app compartmentalization: Compiler-based library privilege separation on stock android,” in CCS ’17, 2017.10.1145/3133956.3134064Search in Google Scholar

[25] L. Lei, Y. He, K. Sun, J. Jing, Y. Wang, Q. Li, and J. Weng, “Vulnerable implicit service: A revisit,” in CCS ’17, 2017.10.1145/3133956.3133975Search in Google Scholar

[26] T. Libert, “An automated approach to auditing disclosure of third-party data collection in website privacy policies,” in WWW ’18, 2018.10.1145/3178876.3186087Search in Google Scholar

[27] J. Lin, B. Liu, N. Sadeh, and J. I. Hong, “Modeling users’ mobile app privacy preferences: Restoring usability in a sea of permission settings,” in SOUPS ’14. USENIX Assoc., 2014.Search in Google Scholar

[28] B. Liu, B. Liu, H. Jin, and R. Govindan, “Efficient privilege de-escalation for ad libraries in mobile apps,” in MobiSys ’15, 2015.10.1145/2742647.2742668Search in Google Scholar

[29] F. Liu, S. Wilson, P. Story, S. Zimmeck, and N. Sadeh, “Towards automatic classification of privacy policy text,” School of Computer Science Carnegie Mellon University, Pittsburgh, PA, Tech. Rep. CMU-ISR-17-118R and CMULTI-17-010, Jun. 2018.Search in Google Scholar

[30] C. D. Manning, P. Raghavan, and H. Schütze, Introduction to information retrieval. Cambridge University Press, 2008.10.1017/CBO9780511809071Search in Google Scholar

[31] E. Mariconti, L. Onwuzurike, P. Andriotis, E. D. Cristofaro, G. J. Ross, and G. Stringhini, “Mamadroid: Detecting android malware by building markov chains of behavioral models,” in NDSS ’17, 2017.10.14722/ndss.2017.23353Search in Google Scholar

[32] F. Marotta-Wurgler, “Does “notice and choice” disclosure regulation work? An empirical study of privacy policies,” https://www.law.umich.edu/centersandprograms/lawandeconomics/workshops/Documents/Paper13.Marotta-Wurgler.Does%20Notice%20and%20Choice%20Disclosure%20Work.pdf, 2015, accessed: June 28, 2019.Search in Google Scholar

[33] A. M. McDonald and L. F. Cranor, “The cost of reading privacy policies,” I/S: A Journal of Law and Policy for the Information Society, vol. 4, no. 3, pp. 540–565, 2008.Search in Google Scholar

[34] P. Mutchler, A. Doupé, J. Mitchell, C. Kruegel, and G. Vigna, “A large-scale study of mobile web app security,” in MoST ’15, 2015.Search in Google Scholar

[35] Y. Nan, Z. Yang, X. Wang, Y. Zhang, D. Zhu, and M. Yang, “Finding clues for your secrets: Semantics-driven, learning-based privacy discovery in mobile apps,” in NDSS ’17, 2017.10.14722/ndss.2018.23092Search in Google Scholar

[36] R. Neisse, G. Steri, D. Geneiatakis, and I. N. Fovino, “A privacy enforcing framework for android applications,” Computers & Security, vol. 62, pp. 257 – 277, 2016.10.1016/j.cose.2016.07.005Search in Google Scholar

[37] Oracle, “Naming a package,” https://docs.oracle.com/javase/tutorial/java/package/namingpkgs.html, 2017, accessed: June 28, 2019.Search in Google Scholar

[38] X. Pan, X. Wang, Y. Duan, X. Wang, and H. Yin, “Dark hazard: Learning-based, large-scale discovery of hidden sensitive operations in android apps,” in NDSS ’17, 2017.10.14722/ndss.2017.23265Search in Google Scholar

[39] R. Ramanath, F. Liu, N. Sadeh, and N. A. Smith, “Unsupervised alignment of privacy policies using hidden markov models,” in ACL ’14, 2014.10.3115/v1/P14-2099Search in Google Scholar

[40] A. Razaghpanah, R. Nithyanand, N. Vallina-Rodriguez, S. Sundaresan, M. Allman, C. Kreibich, and P. Gill, “Apps, trackers, privacy and regulators: A global study of the mobile tracking ecosystem,” in NDSS ’18, 2018.10.14722/ndss.2018.23353Search in Google Scholar

[41] A. Razaghpanah, N. Vallina-Rodriguez, S. Sundaresan, C. Kreibich, P. Gill, M. Allman, and V. Paxson, “Haystack: In situ mobile traffic analysis in user space,” CoRR, vol. abs/1510.01419, 2015.Search in Google Scholar

[42] D. Reidsma and J. Carletta, “Reliability measurement without limits,” Comput. Linguist., vol. 34, no. 3, pp. 319–326, Sep. 2008.10.1162/coli.2008.34.3.319Search in Google Scholar

[43] J. Ren, M. Lindorfer, D. Dubois, A. Rao, D. Choffnes, and N. Vallina-Rodriguez, “Bug fixes, improvements, ... and privacy leaks – a longitudinal study of PII leaks across android app versions,” in NDSS ’18, 2018.10.14722/ndss.2018.23143Search in Google Scholar

[44] J. Ren, A. Rao, M. Lindorfer, A. Legout, and D. Choffnes, “Recon: Revealing and controlling PII leaks in mobile network traffic,” in MobiSys ’16, 2016.Search in Google Scholar

[45] I. Reyes, P. Wijesekera, J. Reardon, A. E. B. On, A. Razaghpanah, N. Vallina-Rodriguez, and S. Egelman, ““Won’t somebody think of the children?" Examining COPPA compliance at scale,” in PETS ’18, vol. 3, 2018, pp. 63–83.10.1515/popets-2018-0021Search in Google Scholar

[46] N. Sadeh, A. Acquisti, T. D. Breaux, L. F. Cranor, A. M. McDonald, J. R. Reidenberg, N. A. Smith, F. Liu, N. C. Russell, F. Schaub, and S. Wilson, “The usable privacy policy project,” Carnegie Mellon University, Tech. report CMU-ISR-13-119, 2013.Search in Google Scholar

[47] K. M. Sathyendra, S. Wilson, F. Schaub, S. Zimmeck, and N. Sadeh, “Identifying the provision of choices in privacy policy text,” in EMNLP ’17, 2017.10.18653/v1/D17-1294Search in Google Scholar

[48] scikit-learn developers, “sklearn.feature_extraction.text.tfidfvectorizer,” http://scikit-learn.org/0.18/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html, 2016, accessed: June 28, 2019.Search in Google Scholar

[49] ——, “sklearn.linear_model.logisticregression,” http://scikit-learn.org/0.18/modules/generated/sklearn.linear_model.LogisticRegression.html, 2016, accessed: June 28, 2019.Search in Google Scholar

[50] ——, “sklearn.svm.svc,” http://scikit-learn.org/0.18/modules/generated/sklearn.svm.SVC.html, 2016, accessed: June 28, 2019.Search in Google Scholar

[51] R. Slavin, X. Wang, M. Hosseini, W. Hester, R. Krishnan, J. Bhatia, T. Breaux, and J. Niu, “Toward a framework for detecting privacy policy violation in android application code,” in ICSE ’16, 2016.10.1145/2884781.2884855Search in Google Scholar

[52] D. J. Solove and W. Hartzog, “The FTC and the new common law of privacy,” Columbia Law Review, vol. 114, pp. 583–676, 2014.10.2139/ssrn.2312913Search in Google Scholar

[53] P. Story, S. Zimmeck, A. Ravichander, D. Smullen, Z. Wang, J. Reidenberg, N. C. Russell, and N. Sadeh, “Natural language processing for mobile app privacy compliance,” AAAI Spring Symposium on Privacy-Enhancing Artificial Intelligence and Language Technologies, Mar. 2019.10.2478/popets-2019-0037Search in Google Scholar

[54] P. Story, S. Zimmeck, and N. Sadeh, “Which apps have privacy policies?” in APF ’18, 2018.10.1007/978-3-030-02547-2_1Search in Google Scholar

[55] W. B. Tesfay, P. Hofmann, T. Nakamura, S. Kiyomoto, and J. Serna, “I read but don’t agree: Privacy policy benchmarking using machine learning and the EU GDPR,” in WWW ’18, 2018.10.1145/3184558.3186969Search in Google Scholar

[56] G. Tottie, Negation in English speech and writing. Academic Press, 1991.Search in Google Scholar

[57] J. Towns, T. Cockerill, M. Dahan, I. Foster, K. Gaither, A. Grimshaw, V. Hazlewood, S. Lathrop, D. Lifka, G. D. Peterson, R. Roskies, J. R. Scott, and N. Wilkins-Diehr, “XSEDE: Accelerating scientific discovery,” Computing in Science & Engineering, vol. 16, no. 5, pp. 62–74, Sep. 2014.10.1109/MCSE.2014.80Search in Google Scholar

[58] G. S. Tuncay, S. Demetriou, K. Ganju, and C. A. Gunter, “Resolving the predicament of android custom permissions,” in NDSS ’18, 2018.10.14722/ndss.2018.23210Search in Google Scholar

[59] N. Viennot, E. Garcia, and J. Nieh, “A measurement study of Google Play,” in SIGMETRICS ’14, 2014.10.1145/2591971.2592003Search in Google Scholar

[60] H. Wang, Z. Liu, Y. Guo, X. Chen, M. Zhang, G. Xu, and J. Hong, “An explorative study of the mobile app ecosystem from app developers’ perspective,” in WWW ’17, 2017.10.1145/3038912.3052712Search in Google Scholar

[61] X. Wang, X. Qin, M. B. Hosseini, R. Slavin, T. D. Breaux, and J. Niu, “GUILeak: Identifying privacy practices on GUI-based data,” https://pdfs.semanticscholar.org/ced1/313acaacd3897b5b231cdccb1383d01d20c4.pdf, 2017, accessed: June 28, 2019.Search in Google Scholar

[62] T. Watanabe, M. Akiyama, T. Sakai, and T. Mori, “Understanding the inconsistencies between text descriptions and the use of privacy-sensitive resources of mobile apps,” in SOUPS ’15, 2015.Search in Google Scholar

[63] S. Wilson, F. Schaub, A. A. Dara, F. Liu, S. Cherivirala, P. G. Leon, M. S. Andersen, S. Zimmeck, K. M. Sathyendra, N. C. Russell, T. B. Norton, E. Hovy, J. Reidenberg, and N. Sadeh, “The creation and analysis of a website privacy policy corpus,” in ACL ’16, 2016.10.18653/v1/P16-1126Search in Google Scholar

[64] L. Yu, X. Luo, X. Liu, and T. Zhang, “Can we trust the privacy policies of android apps?” in DSN ’16, 2016.10.1109/DSN.2016.55Search in Google Scholar

[65] Y. Zhuang, A. Rafetseder, Y. Hu, Y. Tian, and J. Cappos, “Sensibility Testbed: Automated IRB policy enforcement in mobile research apps,” in HotMobile ’18, 2018.10.1145/3177102.3177120Search in Google Scholar

[66] S. Zimmeck and S. M. Bellovin, “Privee: An architecture for automatically analyzing web privacy policies,” in USENIX Security ’14, 2014.Search in Google Scholar

[67] S. Zimmeck, Z. Wang, L. Zou, R. Iyengar, B. Liu, F. Schaub, S. Wilson, N. Sadeh, S. M. Bellovin, and J. Reidenberg, “Automated analysis of privacy requirements for mobile apps,” in NDSS ’17, 2017.10.14722/ndss.2017.23034Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo