1. bookVolume 2017 (2017): Issue 1 (January 2017)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Copyright
© 2020 Sciendo

Towards Seamless Tracking-Free Web: Improved Detection of Trackers via One-class Learning

Published Online: 22 Dec 2016
Page range: 79 - 99
Received: 31 May 2016
Accepted: 02 Sep 2016
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Copyright
© 2020 Sciendo

Numerous tools have been developed to aggressively block the execution of popular JavaScript programs in Web browsers. Such blocking also affects functionality of webpages and impairs user experience. As a consequence, many privacy preserving tools that have been developed to limit online tracking, often executed via JavaScript programs, may suffer from poor performance and limited uptake. A mechanism that can isolate JavaScript programs necessary for proper functioning of the website from tracking JavaScript programs would thus be useful. Through the use of a manually labelled dataset composed of 2,612 JavaScript programs, we show how current privacy preserving tools are ineffective in finding the right balance between blocking tracking JavaScript programs and allowing functional JavaScript code. To the best of our knowledge, this is the first study to assess the performance of current web privacy preserving tools in determining tracking vs. functional JavaScript programs.

Keywords

[1] Adblock Plus. https://www.adblockplus.org.Search in Google Scholar

[2] difflib - Helpers for computing deltas. Python 2.7.11 documentation, https://docs.python.org/2.7/library/difflib.html.Search in Google Scholar

[3] Disconnect. https://www.disconnect.me/.Search in Google Scholar

[4] Disqus. http://www.disqus.com/.Search in Google Scholar

[5] EasyList. https://easylist-downloads.adblockplus.org/easylist.txt.Search in Google Scholar

[6] Easyprivacy. https://easylist.to/easylist/easyprivacy.txt.Search in Google Scholar

[7] EFF DNT Policy. https://www.eff.org/dnt-policy.Search in Google Scholar

[8] Ghostery. https://www.ghostery.com.Search in Google Scholar

[9] Ghostery 5.4.7 is ready to run! https://www.ghostery.com/intelligence/consumer-blog/product-releases/ghostery-547-is-ready-to-run/.Search in Google Scholar

[10] Google Analytics. https://www.google.com/analytics/.Search in Google Scholar

[11] Javascript Obfuscator. Dan’s tools, http://www.danstools.com/javascript-obfuscate/.Search in Google Scholar

[12] Known Adblock Plus Subscriptions. https://adblockplus. org/subscriptions.Search in Google Scholar

[13] Moat. http://www.moat.com/.Search in Google Scholar

[14] NoScript. https://www.noscript.net.Search in Google Scholar

[15] PFanboyList. https://easylist-downloads.adblockplus.org/fanboy-social.txt.Search in Google Scholar

[16] Privacy Badger Firefox. https://github.com/EFForg/privacybadgerfirefox/blob/master/data/cookieblocklist.txt.Search in Google Scholar

[17] Real-Time Digital Advertising That Works | Criteo. http: //www.criteo.com.Search in Google Scholar

[18] Script Surrogates Quick Reference. https://hackademix.net/2011/09/29/script-surrogates-quick-reference/.Search in Google Scholar

[19] Search Results in Ghostery - 381 Topics Found for Breaking. https://getsatisfaction.com/ghostery/searches?query=breaking&x=15&y=10&style=topics.Search in Google Scholar

[20] Visual Revenue. http://www.visualrevenue.com.Search in Google Scholar

[21] What is Selenium? http://www.seleniumhq.org/.Search in Google Scholar

[22] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz. The Web Never Forgets: Persistent Tracking Mechanisms in the Wild. CCS, 2014.Search in Google Scholar

[23] S. Basu, M. Bilenko, and R. J. Mooney. A Probabilistic Framework for Semi-supervised Clustering. KDD, 2004.Search in Google Scholar

[24] A. Chaabane, M. A. Kaafar, and R. Boreli. Big Friend is Watching You: Analyzing Online Social Networks Tracking Capabilities. WOSN, 2012.Search in Google Scholar

[25] C.-C. Chang and C.-J. Lin. LIBSVM: A Library for Support Vector Machines. ACM ToIST, 2011.Search in Google Scholar

[26] N. Corporation. January 2013: Top U.S. Entertainment Sites and Web Brands. Visited 23 May 2016, http://www.nielsen.com/us/en/insights/news/2013/january-2013--top-u-s--entertainment-sites-and-web-brands.html, 2013.Search in Google Scholar

[27] C. Curtsinger, B. Livshits, B. Zorn, and C. Seifert. ZOZZLE: Fast and Precise In-browser JavaScript Malware Detection. USENIX SEC, 2011.Search in Google Scholar

[28] C. Elkan and K. Noto. Learning Classifiers from Only Positive and Unlabeled Data. KDD, 2008.Search in Google Scholar

[29] J. Erman, A. Mahanti, M. Arlitt, I. Cohen, and C. Williamson. Offline/Realtime Traffic Classification Using Semi-supervised Learning. Perform. Eval., 2007.Search in Google Scholar

[30] P. Gill, V. Erramilli, A. Chaintreau, B. Krishnamurthy, K. Papagiannaki, and P. Rodriguez. Follow the Money: Understanding Economics of Online Aggregation and Advertising. IMC, 2013.Search in Google Scholar

[31] D. Gugelmann, B. Ager, and V. Lenders. An Automated Approach for Complementing Ad Blockers’ Blacklists. PETs, 2015.Search in Google Scholar

[32] C.-H. Hsiao, M. Cafarella, and S. Narayanasamy. Using Web Corpus Statistics for Program Analysis. OOPSLA, 2014.Search in Google Scholar

[33] M. Ikram, H. Asghar, M. A. Kaafar, and A. Mahanti. On the Intrusiveness of JavaScript on the Web. CoNEXT, Student Workshop, 2014.Search in Google Scholar

[34] M. Ikram, H. J. Asghar, M. A. Kaafar, B. Krishnamurthy, and A. Mahanti. Towards seamless tracking-free web: Improved detection of trackers via one-class learning. arXiv (pre-print), doi: 603.06289.Search in Google Scholar

[35] D. Jang, R. Jhala, S. Lerner, and H. Shacham. An Empirical Study of Privacy-violating Information Flows in JavaScript Web Applications. CCS, 2010.Search in Google Scholar

[36] B. Krishnamurthy. I Know What You Will Do Next Summer.SIGCOMM CCR, 2010.Search in Google Scholar

[37] B. Krishnamurthy, D. Malandrino, and C. E. Wills. Measuring Privacy Loss and the Impact of Privacy Protection in Web Browsing. SOUPS, 2007.Search in Google Scholar

[38] B. Krishnamurthy, K. Naryshkin, and C. Wills. Privacy leakage vs. Protection measures: The Growing Disconnect. W2SP, 2011.Search in Google Scholar

[39] P. Leon, B. Ur, R. Shay, Y. Wang, R. Balebako, and L. Cranor.Why Johnny Can’T Opt out: A Usability Evaluation of Tools to Limit Online Behavioral Advertising. In SIGCHI, 2012.Search in Google Scholar

[40] B. Liu, Y. Dai, X. Li, W. S. Lee, and P. S. Yu. Building Text Classifiers Using Positive and Unlabeled Examples.ICDM, 2003.Search in Google Scholar

[41] J. R. Mayer and J. C. Mitchell. Third-Party Web Tracking: Policy and Technology. IEEE S&P, 2012.Search in Google Scholar

[42] H. Metwalley, S. Traverso, and M. Mellia. The Online Tracking Horde: a View from Passive Measurements. TMA, 2015.Search in Google Scholar

[43] S. S. Muchnick. Advanced Compiler Design and Implementation.MK Publishers Inc., 1997.Search in Google Scholar

[44] K. R. Muller, S. Mika, G. Ratsch, K. Tsuda, and B. Scholkopf. An Introduction to Kernel-based Learning Algorithms. ToNN, 2001.Search in Google Scholar

[45] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, and G. Vigna. Cookieless Monster: Exploring the Ecosystem of Web-Based Device Fingerprinting. IEEE S&P, 2013.Search in Google Scholar

[46] L. Olejnik, C. Castelluccia, and A. Janc. Why Johnny Can’t Browse in Peace: On the Uniqueness of Web Browsing History Patterns. HotPETs, 2012.Search in Google Scholar

[47] C. R. Orr, A. Chauhan, M. Gupta, C. J. Frisz, and C. W.Dunn. An Approach for Identifying JavaScript-loaded Advertisements Through Static Program Analysis. WPES, 2012.Search in Google Scholar

[48] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res., 2011.Search in Google Scholar

[49] Rieck, Konrad and Krueger, Tammo and Dewald, Andreas.Cujo: Efficient detection and prevention of driveby- download attacks. ACSAC, 2010. Search in Google Scholar

[50] F. Roesner, T. Kohno, and D. Wetherall. Detecting and Defending Against Third-party Tracking on the Web. NSDI, 2012.Search in Google Scholar

[51] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and R. C. Williamson. Estimating the Support of a High- Dimensional Distribution. NC, 2001.Search in Google Scholar

[52] M. Tran, X. Dong, Z. Liang, and X. Jiang. Tracking the Trackers: Fast and Scalable Dynamic Analysis of Web Content for Privacy Violations. ACNS, 2012.Search in Google Scholar

[53] C. Wills and D. Uzunoglu. What Ad Blockers Are (and Are Not) Doing. WPI-CS-TR-16-02, 2016.Search in Google Scholar

[54] W. Xu, F. Zhang, and S. Zhu. The Power of Obfuscation Techniques in Malicious JavaScript Code: A Measurement Study. MALWARE, 2012.Search in Google Scholar

[55] T.-F. Yen, Y. Xie, F. Yu, R. P. Yu, and M. Abadi. Host Fingerprinting and Tracking on the Web: Privacy and Security Implications. NDSS, 2012.Search in Google Scholar

[56] C. Yue and H. Wang. Characterizing Insecure Javascript Practices on the Web. WWW, 2009. Search in Google Scholar

Plan your remote conference with Sciendo