rss_2.0Proceedings on Privacy Enhancing Technologies FeedSciendo RSS Feed for Proceedings on Privacy Enhancing Technologies on Privacy Enhancing Technologies 's Cover Machine Learning Canvas Block<abstract><title style='display:none'>Abstract</title><p>With the aim of increasing online privacy, we present a novel, machine-learning based approach to blocking one of the three main ways website visitors are tracked online—canvas fingerprinting. Because the act of canvas fingerprinting uses, at its core, a JavaScript program, and because many of these programs are reused across the web, we are able to fit several machine learning models around a semantic representation of a potentially offending program, achieving accurate and robust classifiers. Our supervised learning approach is trained on a dataset we created by scraping roughly half a million websites using a custom Google Chrome extension storing information related to the canvas. Classification leverages our key insight that the images drawn by canvas fingerprinting programs have a facially distinct appearance, allowing us to manually classify files based on the images drawn; we take this approach one step further and train our classifiers not on the malleable images themselves, but on the more-difficult-to-change, underlying source code generating the images. As a result, ML-CB allows for more accurate tracker blocking.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Three Years Later: A Study of MAC Address Randomization In Mobile Devices And When It Succeeds<abstract><title style='display:none'>Abstract</title><p>Mobile device manufacturers and operating system developers increasingly deploy MAC address randomization to protect user privacy and prevent adversaries from tracking persistent hardware identifiers. Early MAC address randomization implementations suffered from logic bugs and information leakages that defeated the privacy benefits realized by using temporary, random addresses, allowing devices and users to be tracked in the wild. Recent work either assumes these implementation flaws continue to exist in modern MAC address randomization implementations, or considers only dated software or small numbers of devices.</p><p>In this work, we revisit MAC address randomization by performing a cross-sectional study of 160 models of mobile phones, including modern devices released subsequent to previous studies. We tested each of these phones in a lab setting to determine whether it uses randomization, under what conditions it randomizes its MAC address, and whether it mitigates known tracking vulnerabilities.</p><p>Our results show that, although very new phones with updated operating systems generally provide a high degree of privacy to their users, there are still many phones in wide use today that do not effectively prevent tracking.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Who Can Devices? Security and Privacy of Apple’s Crowd-Sourced Bluetooth Location Tracking System<abstract><title style='display:none'>Abstract</title><p>Overnight, Apple has turned its hundreds-of-million-device ecosystem into the world’s largest crowd-sourced location tracking network called o~ine finding (OF). OF leverages online finder devices to detect the presence of missing o~ine devices using Bluetooth and report an approximate location back to the owner via the Internet. While OF is not the first system of its kind, it is the first to commit to strong privacy goals. In particular, OF aims to ensure finder anonymity, prevent tracking of owner devices, and confidentiality of location reports. This paper presents the first comprehensive security and privacy analysis of OF. To this end, we recover the specifications of the closed-source OF protocols by means of reverse engineering. We experimentally show that unauthorized access to the location reports allows for accurate device tracking and retrieving a user’s top locations with an error in the order of 10 meters in urban areas. While we find that OF’s design achieves its privacy goals, we discover two distinct design and implementation flaws that can lead to a location correlation attack and unauthorized access to the location history of the past seven days, which could deanonymize users. Apple has partially addressed the issues following our responsible disclosure. Finally, we make our research artifacts publicly available.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Genome Reconstruction Attacks Against Genomic Data-Sharing Beacons<abstract><title style='display:none'>Abstract</title><p>Sharing genome data in a privacy-preserving way stands as a major bottleneck in front of the scientific progress promised by the big data era in genomics. A community-driven protocol named <italic>genomic data-sharing beacon protocol</italic> has been widely adopted for sharing genomic data. The system aims to provide a secure, easy to implement, and standardized interface for data sharing by only allowing yes/no queries on the presence of specific alleles in the dataset. However, beacon protocol was recently shown to be vulnerable against membership inference attacks. In this paper, we show that privacy threats against genomic data sharing beacons are not limited to membership inference. We identify and analyze a novel vulnerability of genomic data-sharing beacons: genome reconstruction. We show that it is possible to successfully reconstruct a substantial part of the genome of a victim when the attacker knows the victim has been added to the beacon in a recent update. In particular, we show how an attacker can use the inherent correlations in the genome and clustering techniques to run such an attack in an efficient and accurate way. We also show that even if multiple individuals are added to the beacon during the same update, it is possible to identify the victim’s genome with high confidence using traits that are easily accessible by the attacker (e.g., eye color or hair type). Moreover, we show how a reconstructed genome using a beacon that is not associated with a sensitive phenotype can be used for membership inference attacks to beacons with sensitive phenotypes (e.g., HIV+). The outcome of this work will guide beacon operators on when and how to update the content of the beacon and help them (along with the beacon participants) make informed decisions.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00privGAN: Protecting GANs from membership inference attacks at low cost to utility<abstract><title style='display:none'>Abstract</title><p>Generative Adversarial Networks (GANs) have made releasing of synthetic images a viable approach to share data without releasing the original dataset. It has been shown that such synthetic data can be used for a variety of downstream tasks such as training classifiers that would otherwise require the original dataset to be shared. However, recent work has shown that the GAN models and their synthetically generated data can be used to infer the training set membership by an adversary who has access to the entire dataset and some auxiliary information. Current approaches to mitigate this problem (such as DPGAN [1]) lead to dramatically poorer generated sample quality than the original non–private GANs. Here we develop a new GAN architecture (privGAN), where the generator is trained not only to cheat the discriminator but also to defend membership inference attacks. The new mechanism is shown to empirically provide protection against this mode of attack while leading to negligible loss in downstream performances. In addition, our algorithm has been shown to explicitly prevent memorization of the training set, which explains why our protection is so effective. The main contributions of this paper are: i) we propose a novel GAN architecture that can generate synthetic data in a privacy preserving manner with minimal hyperparameter tuning and architecture selection, ii) we provide a theoretical understanding of the optimal solution of the privGAN loss function, iii) we empirically demonstrate the effectiveness of our model against several white and black–box attacks on several benchmark datasets, iv) we empirically demonstrate on three common benchmark datasets that synthetic images generated by privGAN lead to negligible loss in downstream performance when compared against non– private GANs. While we have focused on benchmarking privGAN exclusively on image datasets, the architecture of privGAN is not exclusive to image datasets and can be easily extended to other types of datasets. Repository link: <ext-link ext-link-type="uri" xmlns:xlink="" xlink:href=""></ext-link>.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Digital Inequality Through the Lens of Self-Disclosure<abstract><title style='display:none'>Abstract</title><p>Recent work has brought to light disparities in privacy-related concerns based on socioeconomic status, race and ethnicity. This paper examines relationships between U.S. based Twitter users’ socio-demographic characteristics and their privacy behaviors. Income, gender, age, race/ethnicity, education level and occupation are correlated with stated and observed privacy preferences of 110 active Twitter users. Contrary to our expectations, analyses suggest that neither socioeconomic status (<italic>SES</italic>) nor demographics is a significant predictor of the use of account security features. We do find that gender and education predict rate of self-disclosure, or voluntary sharing of personal information. We explore variability in the types of information disclosed amongst socio-demographic groups. Exploratory findings indicate that: 1) participants shared less personal information than they recall having shared in exit surveys; 2) there is no strong correlation between people’s stated attitudes and their observed behaviors.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Data Portability between Online Services: An Empirical Analysis on the Effectiveness of GDPR Art. 20<abstract><title style='display:none'>Abstract</title><p>Data portability regulation has promised that individuals will be easily able to transfer their personal data between online service providers. Yet, after more than two years of an active privacy regulation regime in the European Union, this promise is far from being fulfilled. Given the lack of a functioning infrastructure for direct data portability between multiple providers, we investigate in our study how easily an individual could currently make use of an indirect data transfer between providers. We define such porting as a two-step transfer: firstly, requesting a data export from one provider, followed secondly by the import of the obtained data to another provider. To answer this question, we examine the data export practices of 182 online services, including the top one hundred visited websites in Germany according to the Alexa ranking, as well as their data import capabilities. Our main results show that high-ranking services, which primarily represent incumbents of key online markets, provide significantly larger data export scope and increased import possibilities than their lower-ranking competitors. Moreover, they establish more thorough authentication of individuals before export. These first empirical results challenge the theoretical literature on data portability, according to which, it would be expected that incumbents only complied with the minimal possible export scope in order to not lose exclusive consumer data to market competitors free-of-charge. We attribute the practices of incumbents observed in our study to the absence of an infrastructure realizing direct data portability.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Editors’ Introduction Sequencing Flow Cells and the Security of the Molecular-Digital Interface<abstract><title style='display:none'>Abstract</title><p>DNA sequencing is the molecular-to-digital conversion of DNA molecules, which are made up of a linear sequence of bases (A,C,G,T), into digital information. Central to this conversion are specialized fluidic devices, called sequencing flow cells, that distribute DNA onto a surface where the molecules can be read. As more computing becomes integrated with physical systems, we set out to explore how sequencing flow cell architecture can affect the security and privacy of the sequencing process and downstream data analysis. In the course of our investigation, we found that the unusual nature of molecular processing and flow cell design contributes to two security and privacy issues. First, DNA molecules are ‘sticky’ and stable for long periods of time. In a manner analogous to data recovery from discarded hard drives, we hypothesized that residual DNA attached to used flow cells could be collected and re-sequenced to recover a significant portion of the previously sequenced data. In experiments we were able to recover over 23.4% of a previously sequenced genome sample and perfectly decode image files encoded in DNA, suggesting that flow cells may be at risk of data recovery attacks. Second, we hypothesized that methods used to simultaneously sequence separate DNA samples together to increase sequencing throughput (multiplex sequencing), which incidentally leaks small amounts of data between samples, could cause data corruption and allow samples to adversarially manipulate sequencing data. We find that a maliciously crafted synthetic DNA sample can be used to alter targeted genetic variants in other samples using this vulnerability. Such a sample could be used to corrupt sequencing data or even be spiked into tissue samples, whenever untrusted samples are sequenced together. Taken together, these results suggest that, like many computing boundaries, the molecular-to-digital interface raises potential issues that should be considered in future sequencing and molecular sensing systems, especially as they become more ubiquitous.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00The CNAME of the Game: Large-scale Analysis of DNS-based Tracking Evasion<abstract><title style='display:none'>Abstract</title><p>Online tracking is a whack-a-mole game between trackers who build and monetize behavioral user profiles through intrusive data collection, and anti-tracking mechanisms that are deployed as browser extensions, DNS resolvers, or built-in to the browser. As a response to pervasive and opaque online tracking, more and more users adopt anti-tracking measures to preserve their privacy. Consequently, as the information that trackers can gather on users is being curbed, some trackers are looking for ways to evade these protections. In this paper we report on a large-scale longitudinal evaluation of an anti-tracking evasion scheme that leverages CNAME records to include tracker resources in a same-site context, which effectively bypasses anti-tracking measures that rely on fixed hostname-based block lists. Using historical HTTP Archive data we find that this tracking scheme is rapidly gaining traction, especially among high-traffic websites. Furthermore, we report on several privacy and security issues inherent to the technical setup of CNAME-based tracking that we detected through a combination of automated and manual analyses. We find that some trackers are using the technique against the Safari browser, which is known to include strict anti-tracking configurations. Our findings show that websites using CNAME trackers must take extra precautions to avoid leaking sensitive information to third parties.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Exploring mental models of the right to informational self-determination of office workers in Germany<abstract><title style='display:none'>Abstract</title><p>Applied privacy research has so far focused mainly on consumer relations in private life. Privacy in the context of employment relationships is less well studied, although it is subject to the same legal privacy framework in Europe. The European General Data Protection Regulation (GDPR) has strengthened employees’ right to privacy by obliging that employers provide transparency and intervention mechanisms. For such mechanisms to be effective, employees must have a sound understanding of their functions and value. We explored possible boundaries by conducting a semi-structured interview study with 27 office workers in Germany and elicited mental models of the right to informational self-determination, which is the European proxy for the right to privacy. We provide insights into (1) perceptions of different categories of data, (2) familiarity with the legal framework regarding expectations for privacy controls, and (3) awareness of data processing, data flow, safeguards, and threat models. We found that legal terms often used in privacy policies used to describe categories of data are misleading. We further identified three groups of mental models that differ in their privacy control requirements and willingness to accept restrictions on their privacy rights. We also found ignorance about actual data flow, processing, and safeguard implementation. Participants’ mindsets were shaped by their faith in organizational and technical measures to protect privacy. Employers and developers may benefit from our contributions by understanding the types of privacy controls desired by office workers and the challenges to be considered when conceptualizing and designing usable privacy protections in the workplace.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Defining Privacy: How Users Interpret Technical Terms in Privacy Policies<abstract><title style='display:none'>Abstract</title><p>Recent privacy regulations such as GDPR and CCPA have emphasized the need for transparent, understandable privacy policies. This work investigates the role technical terms play in policy transparency. We identify potentially misunderstood technical terms that appear in privacy policies through a survey of current privacy policies and a pilot user study. We then run a user study on Amazon Mechanical Turk to evaluate whether users can accurately define these technical terms, to identify commonly held misconceptions, and to investigate how the use of technical terms affects users’ comfort with privacy policies. We find that technical terms are broadly misunderstood and that particular misconceptions are common. We also find that the use of technical terms affects users’ comfort with various privacy policies and their reported likeliness to accept those policies. We conclude that current use of technical terms in privacy policies poses a challenge to policy transparency and user privacy, and that companies should take steps to mitigate this effect.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Growing synthetic data through differentially-private vine copulas<abstract><title style='display:none'>Abstract</title><p>In this work, we propose a novel approach for the synthetization of data based on copulas, which are interpretable and robust models, extensively used in the actuarial domain. More precisely, our method COPULA-SHIRLEY is based on the differentially-private training of vine copulas, which are a family of copulas allowing to model and generate data of arbitrary dimensions. The framework of COPULA-SHIRLEY is simple yet flexible, as it can be applied to many types of data while preserving the utility as demonstrated by experiments conducted on real datasets. We also evaluate the protection level of our data synthesis method through a membership inference attack recently proposed in the literature.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00A First Look at Private Communications in Video Games using Visual Features<abstract><title style='display:none'>Abstract</title><p>Internet privacy is threatened by expanding use of automated mass surveillance and censorship techniques. In this paper, we investigate the feasibility of using video games and virtual environments to evade automated detection, namely by manipulating elements in the game environment to compose and share text with other users. This technique exploits the fact that text spotting in the wild is a challenging problem in computer vision. To test our hypothesis, we compile a novel dataset of text generated in popular video games and analyze it using state-of-the-art text spotting tools. Detection rates are negligible in most cases. Retraining these classifiers specifically for game environments leads to dramatic improvements in some cases (ranging from 6% to 65% in most instances) but overall effectiveness is limited: the costs and benefits of retraining vary significantly for different games, this strategy does not generalize, and, interestingly, users can still evade detection using novel configurations and arbitrary-shaped text. Communicating in this way yields very low bitrates (0.3-1.1 bits/s) which is suited for very short messages, and applications such as microblogging and bootstrapping off-game communications (dialing). This technique does not require technical sophistication and runs easily on existing games infrastructure without modification. We also discuss potential strategies to address efficiency, bandwidth, and security constraints of video game environments. To the best of our knowledge, this is the first such exploration of video games and virtual environments from a computer vision perspective.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Fast Privacy-Preserving Punch Cards<abstract><title style='display:none'>Abstract</title><p>Loyalty programs in the form of punch cards that can be redeemed for benefits have long been a ubiquitous element of the consumer landscape. However, their increasingly popular digital equivalents, while providing more convenience and better bookkeeping, pose a considerable privacy risk. This paper introduces a privacy-preserving punch card protocol that allows firms to digitize their loyalty programs without forcing customers to submit to corporate surveillance. We also present a number of extensions that allow our scheme to provide other privacy-preserving customer loyalty features.</p><p>Compared to the best prior work, we achieve a 14× reduction in the computation and a 11× reduction in the communication required to perform a “hole punch,” a 55× reduction in the communication required to redeem a punch card, and a 128× reduction in the computation time required to redeem a card. Much of our performance improvement can be attributed to removing the reliance on pairings or range proofs present in prior work, which has only addressed this problem in the context of more general loyalty systems. By tailoring our scheme to punch cards and related loyalty systems, we demonstrate that we can reduce communication and computation costs by orders of magnitude.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Foundations of Ring Sampling<abstract><title style='display:none'>Abstract</title><p>A ring signature scheme allows the signer to sign on behalf of an ad hoc set of users, called a ring. The verifier can be convinced that a ring member signs, but cannot point to the exact signer. Ring signatures have become increasingly important today with their deployment in anonymous cryptocurrencies. Conventionally, it is implicitly assumed that all ring members are equally likely to be the signer. This assumption is generally false in reality, leading to various practical and devastating deanonymizing attacks in Monero, one of the largest anonymous cryptocurrencies. These attacks highlight the unsatisfactory situation that how a ring should be chosen is poorly understood.</p><p>We propose an analytical model of ring samplers towards a deeper understanding of them through systematic studies. Our model helps to describe how anonymous a ring sampler is with respect to a given signer distribution as an information-theoretic measure. We show that this measure is robust – it only varies slightly when the signer distribution varies slightly. We then analyze three natural samplers – uniform, mimicking, and partitioning – under our model with respect to a family of signer distributions modeled after empirical Bitcoin data. We hope that our work paves the way towards researching ring samplers from a theoretical point of view.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Unlinkable Updatable Hiding Databases and Privacy-Preserving Loyalty Programs<abstract><title style='display:none'>Abstract</title><p>Loyalty programs allow vendors to profile buyers based on their purchase histories, which can reveal privacy sensitive information. Existing privacy-friendly loyalty programs force buyers to choose whether their purchases are linkable. Moreover, vendors receive more purchase data than required for the sake of profiling. We propose a privacy-preserving loyalty program where purchases are always unlinkable, yet a vendor can profile a buyer based on her purchase history, which remains hidden from the vendor. Our protocol is based on a new building block, an unlinkable updatable hiding database (HD), which we define and construct. HD allows the vendor to initialize and update databases stored by buyers that contain their purchase histories and their accumulated loyalty points. Updates are unlinkable and, at each update, the database is hidden from the vendor. Buyers can neither modify the database nor use old versions of it. Our construction for HD is practical for large databases.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Faster homomorphic comparison operations for BGV and BFV<abstract><title style='display:none'>Abstract</title><p>Fully homomorphic encryption (FHE) allows to compute any function on encrypted values. However, in practice, there is no universal FHE scheme that is effi-cient in all possible use cases. In this work, we show that FHE schemes suitable for arithmetic circuits (e.g. BGV or BFV) have a similar performance as FHE schemes for non-arithmetic circuits (TFHE) in basic comparison tasks such as less-than, maximum and minimum operations. Our implementation of the less-than function in the HElib library is up to 3 times faster than the prior work based on BGV/BFV. It allows to compare a pair of 64-bit integers in 11 milliseconds, sort 64 32-bit integers in 19 seconds and find the minimum of 64 32-bit integers in 9.5 seconds on an average laptop without multi-threading.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00FoggySight: A Scheme for Facial Lookup Privacy<abstract><title style='display:none'>Abstract</title><p>Advances in deep learning algorithms have enabled better-than-human performance on face recognition tasks. In parallel, private companies have been scraping social media and other public websites that tie photos to identities and have built up large databases of labeled face images. Searches in these databases are now being offered as a service to law enforcement and others and carry a multitude of privacy risks for social media users. In this work, we tackle the problem of providing privacy from such face recognition systems. We propose and evaluate FoggySight, a solution that applies lessons learned from the adversarial examples literature to modify facial photos in a privacy-preserving manner before they are uploaded to social media. FoggySight’s core feature is a community protection strategy where users acting as protectors of privacy for others upload decoy photos generated by adversarial machine learning algorithms. We explore different settings for this scheme and find that it does enable protection of facial privacy – including against a facial recognition service with unknown internals.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00Awareness, Adoption, and Misconceptions of Web Privacy Tools<abstract><title style='display:none'>Abstract</title><p>Privacy and security tools can help users protect themselves online. Unfortunately, people are often unaware of such tools, and have potentially harmful misconceptions about the protections provided by the tools they know about. Effectively encouraging the adoption of privacy tools requires insights into people’s tool awareness and understanding. Towards that end, we conducted a demographically-stratified survey of 500 US participants to measure their use of and perceptions about five web browsing-related tools: private browsing, VPNs, Tor Browser, ad blockers, and antivirus software. We asked about participants’ perceptions of the protections provided by these tools across twelve realistic scenarios. Our thematic analysis of participants’ responses revealed diverse forms of misconceptions. Some types of misconceptions were common across tools and scenarios, while others were associated with particular combinations of tools and scenarios. For example, some participants suggested that the privacy protections offered by private browsing, VPNs, and Tor Browser would also protect them from security threats – a misconception that might expose them to preventable risks. We anticipate that our findings will help researchers, tool designers, and privacy advocates educate the public about privacy- and security-enhancing technologies.</p></abstract>ARTICLE2021-04-27T00:00:00.000+00:00en-us-1