1. bookVolume 2022 (2022): Issue 1 (January 2022)
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
access type Open Access

OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile Browsers

Published Online: 20 Nov 2021
Page range: 227 - 252
Received: 31 May 2021
Accepted: 16 Sep 2021
Journal Details
License
Format
Journal
First Published
16 Apr 2015
Publication timeframe
4 times per year
Languages
English
Abstract

Over half of all visits to websites now take place in a mobile browser, yet the majority of web privacy studies take the vantage point of desktop browsers, use emulated mobile browsers, or focus on just a single mobile browser instead. In this paper, we present a comprehensive web-tracking measurement study on mobile browsers and privacy-focused mobile browsers. Our study leverages a new web measurement infrastructure, OmniCrawl, which we develop to drive browsers on desktop computers and smartphones located on two continents. We capture web tracking measurements using 42 different non-emulated browsers simultaneously. We find that the third-party advertising and tracking ecosystem of mobile browsers is more similar to that of desktop browsers than previous findings suggested. We study privacy-focused browsers and find their protections differ significantly and in general are less for lower-ranked sites. Our findings also show that common methodological choices made by web measurement studies, such as the use of emulated mobile browsers and Selenium, can lead to website behavior that deviates from what actual users experience.

Keywords

[1] Google Play Store page for Brave. https://play.google.com/store/apps/details?id=com.brave.browser, 2021. Search in Google Scholar

[2] Google Play Store page for DuckDuckGo. https://play.google.com/store/apps/details?id=com.duckduckgo.mobile.android, 2021. Search in Google Scholar

[3] Google Play Store page for Firefox. https://play.google.com/store/apps/details?id=org.mozilla.firefox, 2021. Search in Google Scholar

[4] Google Play Store page for Firefox Focus. https://play.google.com/store/apps/details?id=org.mozilla.focus, 2021. Search in Google Scholar

[5] Google Play Store page for Ghostery. https://play.google.com/store/apps/details?id=com.ghostery.android.ghostery, 2021. Search in Google Scholar

[6] Google Play Store page for Tor. https://play.google.com/store/apps/details?id=org.torproject.torbrowser, 2021. Search in Google Scholar

[7] G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz. The web never forgets: Persistent tracking mechanisms in the wild. In Proc. ACM CCS, 2014. Search in Google Scholar

[8] G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. Gürses, F. Piessens, and B. Preneel. FPDetective: dusting the web for fingerprinters. In Proc. ACM CCS, 2013. Search in Google Scholar

[9] AdGuard. AdGuard base filter. https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_2_English/filter.txt, 2021. Search in Google Scholar

[10] AdGuard. AdGuard chinese ads filter. https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_224_Chinese/filter.txt, 2021. Search in Google Scholar

[11] AdGuard. AdGuard mobile ads filter. https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_11_Mobile/filter.txt, 2021. Search in Google Scholar

[12] AdGuard. AdGuard tracking protection filter. https://raw.githubusercontent.com/AdguardTeam/FiltersRegistry/master/filters/filter_3_Spyware/filter.txt, 2021. Search in Google Scholar

[13] S. S. Ahmad, M. D. Dar, M. F. Zaffar, N. Vallina-Rodriguez, and R. Nithyanand. Apophanies or epiphanies? How crawlers impact our understanding of the web. In Proc. WWW, 2020. Search in Google Scholar

[14] M. Azizyan, I. Constandache, and R. Roy Choudhury. SurroundSense: mobile phone localization via ambience finger-printing. In Proc. MOBICOM, 2009. Search in Google Scholar

[15] S. Badle. Selenium—web browser automation. https://www.seleniumhq.org/, 2019. Search in Google Scholar

[16] P. Barford, I. Canadi, D. Krushevskaja, Q. Ma, and S. Muthukrishnan. Adscape: harvesting and analyzing online display ads. In Proc. WWW, 2014. Search in Google Scholar

[17] M. A. Bashir, S. Arshad, W. Robertson, and C. Wilson. Tracing information flows between ad exchanges using retargeted ads. In Proc. USENIX, 2016. Search in Google Scholar

[18] BetaFish Inc. AdBlock. https://getadblock.com/, 2019. Search in Google Scholar

[19] K. Boda, Á. M. Földes, G. G. Gulyás, and S. Imre. User tracking on the web via cross-browser fingerprinting. In Proc. NordSec, 2011. Search in Google Scholar

[20] H. Bojinov, Y. Michalevsky, G. Nakibly, and D. Boneh. Mobile device identification via sensor fingerprinting. arXiv preprint arXiv:1408.1416, 2014. Search in Google Scholar

[21] T. Book and D. S. Wallach. An Empirical Study of Mobile Ad Targeting. arXiv:1502.06577 [cs], 2015. arXiv: 1502.06577. Search in Google Scholar

[22] Brave Software, Inc. Brave passes 15 million monthly active users and 5 million daily active users, showing 2.25x MAU growth in the past year. https://brave.com/15-million/, 2020. Search in Google Scholar

[23] Brave Software, Inc. Secure, fast & private web browser with adblocker | Brave browser. https://brave.com/, 2020. Search in Google Scholar

[24] BrowserLeaks. BrowserLeaks - web browser fingerprinting -browsing privacy. https://browserleaks.com/, 2020. Search in Google Scholar

[25] F. Cangialosi, T. Chung, D. Choffnes, D. Levin, B. M. Maggs, A. Mislove, and C. Wilson. Measurement and analysis of private key sharing in the https ecosystem. In Proc. ACM CCS, 2016. Search in Google Scholar

[26] Y. Cao, S. Li, and E. Wijmans. (Cross-)browser fingerprinting via OS and hardware level features. In Proc. NDSS, 2017. Search in Google Scholar

[27] Y. Cheng, X. Ji, J. Zhang, W. Xu, and Y.-C. Chen. DeMiCPU: Device fingerprinting with magnetic signals radiated by CPU. In Proc. ACM CCS, 2019. Search in Google Scholar

[28] Chromium Project. Issue 3220: Websites can detect use of chromedriver or Selenium through the ‘getPageCache‘ key -chromedriver. https://bugs.chromium.org/p/chromedriver/issues/detail?id=3220, 2019. Search in Google Scholar

[29] Cliqz GmbH. Ghostery extension. https://www.ghostery.com/products/, 2019. Search in Google Scholar

[30] Cliqz GmbH. Ghostery privacy browser. https://play.google.com/store/apps/details?id=com.ghostery.android.ghostery&hl=en, 2019. Search in Google Scholar

[31] W. J. Conover and R. L. Iman. Rank Transformations as a Bridge Between Parametric and Nonparametric Statistics. The American Statistician, 1981. Search in Google Scholar

[32] A. Cortesi, M. Hils, T. Kriechbaumer, and contributors. mitmproxy: A free and open source interactive HTTPS proxy. https://mitmproxy.org/, 2010–. [Version 4.0]. Search in Google Scholar

[33] A. Das, G. Acar, N. Borisov, and A. Pradeep. The web’s sixth sense: A study of scripts accessing smartphone sensors. In Proc. ACM CCS, 2018. Search in Google Scholar

[34] A. Das, N. Borisov, and M. Caesar. Tracking mobile web users through motion sensors: Attacks and defenses. In Proc. NDSS, 2016. Search in Google Scholar

[35] S. Dey, N. Roy, W. Xu, R. R. Choudhury, and S. Nelakuditi. AccelPrint: Imperfections of accelerometers make smartphones trackable. In Proc. NDSS, 2014. Search in Google Scholar

[36] Disconnect, Inc. Disconnect. https://disconnect.me, 2019. Search in Google Scholar

[37] DuckDuckGo. DuckDuckGo app. https://duckduckgo.com/app, 2019. Search in Google Scholar

[38] DuckDuckGo. Duckduckgo tracker radar. https://github.com/duckduckgo/tracker-radar, 2021. Search in Google Scholar

[39] S. Dudoit and M. J. v. d. Laan. Multiple Testing Procedures with Applications to Genomics. Springer Series in Statistics. 2008. Search in Google Scholar

[40] EasyList. EasyList without rules for adult sites. https://easylist-downloads.adblockplus.org/easylist_noadult.txt, 2021. Search in Google Scholar

[41] EasyList. EasyPrivacy rules list. https://easylist.to/easylist/easyprivacy.txt, 2021. Search in Google Scholar

[42] P. Eckersley. How unique is your web browser? In Proc. PETS, 2010. Search in Google Scholar

[43] S. Englehardt and A. Narayanan. Online tracking: A 1-million-site measurement and analysis. In Proc. ACM CCS, 2016. Search in Google Scholar

[44] S. Englehardt, D. Reisman, C. Eubank, P. Zimmerman, J. Mayer, A. Narayanan, and E. W. Felten. Cookies that give you away: The surveillance implications of web tracking. In Proc. WWW, 2015. Search in Google Scholar

[45] C. Eubank, M. Melara, D. Perez-Botero, and A. Narayanan. Shining the floodlights on mobile web tracking—A privacy survey. In Proc. IEEE W2SP, 2013. Search in Google Scholar

[46] eyeo GmbH. AdBlock Plus. https://adblockplus.org/, 2019. Search in Google Scholar

[47] Freedom House. Freedom in the world. https://freedomhouse.org/sites/default/files/2020-02/FH_FIW_Report_2018_Final.pdf, 2018. Search in Google Scholar

[48] M. Friedman. The Use of Ranks to Avoid the Assumption of Normality Implicit in the Analysis of Variance. Journal of the American Statistical Association, 1937. Search in Google Scholar

[49] Friends2Follow. Friends2follow privacy policy. https://friends2follow.com/privacy.html, 2021. Search in Google Scholar

[50] A. Gómez-Boix, P. Laperdrix, and B. Baudry. Hiding in the crowd: an analysis of the effectiveness of browser finger-printing at large scale. In Proc. WWW, 2018. Search in Google Scholar

[51] S. Holm. A Simple Sequentially Rejective Multiple Test Procedure. Scandinavian Journal of Statistics, 1979. Search in Google Scholar

[52] X. Hu, G. Suaréz de Tangil, and N. Sastry. Multi-country study of third party trackers from real browser histories. In Proc. IEEE European S & P, 2020. Search in Google Scholar

[53] T. Hupperich, D. Maiorca, M. Kührer, T. Holz, and G. Giacinto. On the robustness of mobile device fingerprinting: Can mobile users escape modern web-tracking mechanisms? In Proc. ACSAC, 2015. Search in Google Scholar

[54] H. Jonker, B. Krumnow, and G. Vlot. Fingerprint surface-based detection of web bot detectors. In Proc. ESORICS, 2019. Search in Google Scholar

[55] A. Klein and B. Pinkas. Dns cache-based user tracking. In Proc. NDSS, 2019. Search in Google Scholar

[56] P. Laperdrix. FP central. https://fpcentral.tbb.torproject.org/about, 2016. Search in Google Scholar

[57] P. Laperdrix, W. Rudametkin, and B. Baudry. Beauty and the beast: Diverting modern web browsers to build unique browser fingerprints. In Proc. IEEE S & P, 2016. Search in Google Scholar

[58] V. Le Pochat, T. Van Goethem, S. Tajalizadehkhoob, M. Korczyński, and W. Joosen. Tranco: A research-oriented top sites ranking hardened against manipulation. In Proc. NDSS, 2019. Search in Google Scholar

[59] M. Lecuyer, R. Spahn, Y. Spiliopolous, A. Chaintreau, R. Geambasu, and D. Hsu. Sunlight: Fine-grained Targeting Detection at Scale with Statistical Confidence. In Proc. ACM CCS, 2015. Search in Google Scholar

[60] T.-C. Li, H. Hang, M. Faloutsos, and P. Efstathopoulos. Trackadvisor: Taking back browsing privacy from third-party trackers. In Proc. ICPANM, 2015. Search in Google Scholar

[61] T. Libert. Exposing the hidden web: An analysis of third-party HTTP requests on 1 million websites. International Journal of Communication, 2015. Search in Google Scholar

[62] T. Libert. webxray. https://webxray.org/, 2019. Search in Google Scholar

[63] M. Lécuyer, G. Ducoffe, F. Lan, A. Papancea, T. Petsios, R. Spahn, A. Chaintreau, and R. Geambasu. XRay: Enhancing the Web’s Transparency with Differential Correlation. In Proc. USENIX, 2014. Search in Google Scholar

[64] H. B. Mann and D. R. Whitney. On a Test of Whether one of Two Random Variables is Stochastically Larger than the Other. The Annals of Mathematical Statistics, 1947. Search in Google Scholar

[65] N. Matyunin, Y. Wang, T. Arul, K. Kullmann, J. Szefer, and S. Katzenbeisser. Magneticspy: Exploiting magnetometer in mobile devices for website and application finger-printing. In Proc. ACM WPES, 2019. Search in Google Scholar

[66] J. R. Mayer and J. C. Mitchell. Third-party web tracking: Policy and technology. In Proc. IEEE S & P, 2012. Search in Google Scholar

[67] G. Merzdovnik, M. Huber, D. Buhov, N. Nikiforakis, S. Neuner, M. Schmiedecker, and E. Weippl. Block me if you can: A large-scale study of tracker-blocking tools. In Proc. IEEE European S & P, 2017. Search in Google Scholar

[68] V. Mishra, P. Laperdrix, A. Vastel, W. Rudametkin, R. Rouvoy, and M. Lopatka. Don’t count me out: On the relevance of IP address in the tracking ecosystem. In Proc. WWW, 2020. Search in Google Scholar

[69] K. Mowery and H. Shacham. Pixel perfect: Fingerprinting canvas in HTML5. In Proc. IEEE W2SP, 2012. Search in Google Scholar

[70] Mozilla Foundation. DeviceMotionEvent. https://developer.mozilla.org/en-US/docs/Web/API/DeviceMotionEvent. Search in Google Scholar

[71] Mozilla Foundation. Firefox 56 release notes. https://developer.mozilla.org/en-US/docs/Mozilla/Firefox/Releases/56#plugins, 2017. Search in Google Scholar

[72] Mozilla Foundation. Navigator.webdriver. https://developer.mozilla.org/en-US/docs/Web/API/Navigator/webdriver, 2018. Search in Google Scholar

[73] Mozilla Foundation. Light api browser compatibility table. https://developer.mozilla.org/en-US/docs/Web/API/Ambient_Light_Events#Browser_compatibility, 2019. Search in Google Scholar

[74] Mozilla Foundation. Magnetometer API browser compatibility table. https://developer.mozilla.org/en-US/docs/Web/API/Magnetometer#Browser_Compatibility, 2019. Search in Google Scholar

[75] Mozilla Foundation. Proximity api browser compatibility table. https://developer.mozilla.org/en-US/docs/Web/API/Proximity_Events#Browser_compatibility, 2019. Search in Google Scholar

[76] Mozilla Foundation. What is Firefox Focus? https://support.mozilla.org/en-US/kb/focus, 2019. Search in Google Scholar

[77] Mozilla Foundation. Public suffix list. https://publicsuffix.org/, 2020. Search in Google Scholar

[78] Mozilla Foundation. Enhanced tracking protection in firefox for android. https://support.mozilla.org/en-US/kb/enhanced-tracking-protection-firefox-android#w_see-what-is-protectedblocked-on-a-website, 2021. Search in Google Scholar

[79] Mozilla Foundation. Firefox for Android version 86.1.1. https://github.com/mozilla-mobile/fenix/releases/tag/v86.1.1, 2021. Search in Google Scholar

[80] S. Nath. MAdScope: Characterizing Mobile In-App Targeted Ads. In Proc. MobiSys, 2015. Search in Google Scholar

[81] C. Neasbitt, B. Li, R. Perdisci, L. Lu, K. Singh, and K. Li. Webcapsule: Towards a lightweight forensic engine for web browsers. In Proc. ACM CCS, 2015. Search in Google Scholar

[82] NetApplications.com. Browser market share. www. netmarketshare.com, 2021. Search in Google Scholar

[83] N. Nikiforakis, A. Kapravelos, W. Joosen, C. Kruegel, F. Piessens, and G. Vigna. Cookieless monster: Exploring the ecosystem of web-based device fingerprinting. In Proc. IEEE S & P, 2013. Search in Google Scholar

[84] Pierre Laperdrix. Browser fingerprinting: An introduction and the challenges ahead. https://blog.torproject.org/browser-fingerprinting-introduction-and-challenges-ahead?page=1, 2019. Search in Google Scholar

[85] E. Pujol, O. Hohlfeld, and A. Feldmann. Annoyed users: Ads and ad-block usage in the wild. In Proc. IMC, 2015. Search in Google Scholar

[86] F. Roesner, T. Kohno, and D. Wetherall. Detecting and defending against third-party tracking on the web. In Proc. NSDI, 2012. Search in Google Scholar

[87] S. Schelter and J. Kunegis. Tracking the trackers: A large-scale analysis of embedded web trackers. In Proc. ICWSM, 2016. Search in Google Scholar

[88] M. Schwarz, F. Lackner, and D. Gruss. JavaScript template attacks: Automatically inferring host information for targeted exploits. In Proc. NDSS, 2019. Search in Google Scholar

[89] G. Scott. Adobe: No Flash for You, Android 4.1. https://www.wired.com/2012/06/android-4-1-no-flash-you/. Search in Google Scholar

[90] P. C. Sham and S. M. Purcell. Statistical power and significance testing in large-scale genetic studies. Nature Reviews Genetics, 2014. Search in Google Scholar

[91] R. Shay, S. Komanduri, A. L. Durity, P. S. Huh, M. L. Mazurek, S. M. Segreti, B. Ur, L. Bauer, N. Christin, and L. F. Cranor. Designing Password Policies for Strength and Usability. ACM Trans. Inf. Syst. Secur., 2016. Search in Google Scholar

[92] R. Simpson. Mobile and tablet internet usage exceeds desktop for first time worldwide. https://gs.statcounter.com/press/mobile-and-tablet-internet-usage-exceeds-desktop-for-first-time-worldwide, 2016. Search in Google Scholar

[93] B. Software. The mounting costs of stale ad-blocking rules. https://brave.com/the-mounting-cost-of-stale-ad-blocking-rules/, 2018. Search in Google Scholar

[94] Statcounter. Mobile browser market share worldwide. https://gs.statcounter.com/browser-market-share/mobile/worldwide, 2020. Search in Google Scholar

[95] E. Sy, C. Burkert, H. Federrath, and M. Fischer. Tracking users across the web via TLS session resumption. In Proc. ACSAC, 2018. Search in Google Scholar

[96] Tor Project. The design and implementation of the Tor browser [draft]. https://2019.www.torproject.org/projects/torbrowser/design/#fingerprinting-linkability, 2018. Search in Google Scholar

[97] Tor Project. Should I install a new add-on or extension in Tor Browser, like AdBlock Plus or uBlock Origin? https://support.torproject.org/tbb/tbb-14/, 2020. Search in Google Scholar

[98] Tor Project. Tor Project | anonymity online. https://www.torproject.org/, 2020. Search in Google Scholar

[99] Tor Project. Users – Tor metrics. https://metrics.torproject.org/userstats-relay-country.html?start=2020-05-01&end=2020-06-01&country=all&events=off, 2020. Search in Google Scholar

[100] T. Van Goethem, V. Le Pochat, and W. Joosen. Mobile friendly or attacker friendly?: A large-scale security evaluation of mobile-first websites. In Proc. ASIACCS, 2019. Search in Google Scholar

[101] V. Vasilyev. Fingerprint.js. https://github.com/Valve/fingerprintjs2, 2015. Search in Google Scholar

[102] A. Vastel, P. Laperdrix, W. Rudametkin, and R. Rouvoy. FP-Scanner: the privacy implications of browser fingerprint inconsistencies. In Proc. USENIX, 2018. Search in Google Scholar

[103] A. Vastel, W. Rudametkin, R. Rouvoy, and X. Blanc. FP-Crawlers: Studying the Resilience of Browser Fingerprinting to Block Crawlers. In Proc. NDSS, 2020. Search in Google Scholar

[104] F. Wilcoxon. Individual Comparisons by Ranking Methods. Biometrics Bulletin, 1945. Search in Google Scholar

[105] M. Wood. Today’s Firefox blocks third-party tracking cookies and cryptomining by default. https://blog.mozilla.org/en/products/firefox/todays-firefox-blocks-third-party-tracking-cookies-and-cryptomining-by-default/, 2021. Search in Google Scholar

[106] Z. Yang and C. Yue. A comparative measurement study of web tracking on mobile and desktop environments. In Proc. PETS, 2020. Search in Google Scholar

[107] D. Zeber, S. Bird, C. Oliveira, W. Rudametkin, I. Segall, F. Wollsén, and M. Lopatka. The representativeness of automated web crawls as a surrogate for human browsing. In Proc. WWW, 2020. Search in Google Scholar

[108] J. Zhang, A. R. Beresford, and I. Sheret. SENSORID: Sensor calibration fingerprinting for smartphones. In Proc. IEEE S & P, 2019. Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo