rss_2.0Proceedings on Privacy Enhancing Technologies FeedSciendo RSS Feed for Proceedings on Privacy Enhancing Technologies on Privacy Enhancing Technologies 's Cover’ Introduction Studying User Preferences and Perception of Covid Vaccination Certificates<abstract> <title style='display:none'>Abstract</title> <p>Digital tools play an important role in fighting the current global COVID-19 pandemic. We conducted a representative online study in Germany on a sample of 599 participants to evaluate the user perception of vaccination certificates. We investigated five different variants of vaccination certificates based on deployed and planned designs in a between-group design, including paper-based and app-based variants. Our main results show that the willingness to use and adopt vaccination certificates is generally high. Overall, paper-based vaccination certificates were favored over app-based solutions. The willingness to use digital apps decreased significantly by a higher disposition to privacy and increased by higher worries about the pandemic and acceptance of the coronavirus vaccination. Vaccination certificates resemble an interesting use case for studying privacy perceptions for health-related data. We hope that our work will educate the currently ongoing design of vaccination certificates, give us deeper insights into the privacy of health-related data and apps, and prepare us for future potential applications of vaccination certificates and health apps in general.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00(∈, δ)-Indistinguishable Mixing for Cryptocurrencies<abstract> <title style='display:none'>Abstract</title> <p>We propose a new theoretical approach for building anonymous mixing mechanisms for cryptocurrencies. Rather than requiring a fully uniform permutation during mixing, we relax the requirement, insisting only that neighboring permutations are similarly likely. This is defined formally by borrowing from the definition of differential privacy. This relaxed privacy definition allows us to greatly reduce the amount of interaction and computation in the mixing protocol. Our construction achieves <italic>O</italic>(<italic>n·</italic>polylog(<italic>n</italic>)) computation time for mixing <italic>n</italic> addresses, whereas all other mixing schemes require <italic>O</italic>(<italic>n</italic><sup>2</sup>) total computation across all parties. Additionally, we support a smooth tolerance of fail-stop adversaries and do not require any trusted setup. We analyze the security of our generic protocol under the UC framework, and under a stand-alone, game-based definition. We finally describe an instantiation using ring signatures and confidential transactions.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00AriaNN: Low-Interaction Privacy-Preserving Deep Learning via Function Secret Sharing<abstract> <title style='display:none'>Abstract</title> <p>We propose A<sc>ria</sc>NN, a low-interaction privacy-preserving framework for private neural network training and inference on sensitive data.</p> <p>Our semi-honest 2-party computation protocol (with a trusted dealer) leverages function secret sharing, a recent lightweight cryptographic protocol that allows us to achieve an efficient online phase. We design optimized primitives for the building blocks of neural networks such as ReLU, MaxPool and BatchNorm. For instance, we perform private comparison for ReLU operations with a single message of the size of the input during the online phase, and with preprocessing keys close to 4<italic>×</italic> smaller than previous work. Last, we propose an extension to support <italic>n</italic>-party private federated learning.</p> <p>We implement our framework as an extensible system on top of PyTorch that leverages CPU and GPU hardware acceleration for cryptographic and machine learning operations. We evaluate our end-to-end system for private inference between distant servers on standard neural networks such as AlexNet, VGG16 or ResNet18, and for private training on smaller networks like LeNet. We show that computation rather than communication is the main bottleneck and that using GPUs together with reduced key size is a promising solution to overcome this barrier.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00OmniCrawl: Comprehensive Measurement of Web Tracking With Real Desktop and Mobile Browsers<abstract> <title style='display:none'>Abstract</title> <p>Over half of all visits to websites now take place in a mobile browser, yet the majority of web privacy studies take the vantage point of desktop browsers, use emulated mobile browsers, or focus on just a single mobile browser instead. In this paper, we present a comprehensive web-tracking measurement study on mobile browsers and privacy-focused mobile browsers. Our study leverages a new web measurement infrastructure, OmniCrawl, which we develop to drive browsers on desktop computers and smartphones located on two continents. We capture web tracking measurements using 42 different non-emulated browsers simultaneously. We find that the third-party advertising and tracking ecosystem of mobile browsers is more similar to that of desktop browsers than previous findings suggested. We study <italic>privacy-focused</italic> browsers and find their protections differ significantly and in general are less for lower-ranked sites. Our findings also show that common methodological choices made by web measurement studies, such as the use of emulated mobile browsers and Selenium, can lead to website behavior that deviates from what actual users experience.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Polymath: Low-Latency MPC via Secure Polynomial Evaluations and Its Applications<abstract> <title style='display:none'>Abstract</title> <p>While the practicality of secure multi-party computation (MPC) has been extensively analyzed and improved over the past decade, we are hitting the limits of efficiency with the traditional approaches of representing the computed functionalities as generic arithmetic or Boolean circuits. This work follows the design principle of identifying and constructing fast and provably-secure MPC protocols to evaluate useful high-level algebraic abstractions; thus, improving the efficiency of all applications relying on them. We present Polymath, a constant-round secure computation protocol suite for the secure evaluation of (multi-variate) polynomials of scalars and matrices, functionalities essential to numerous data-processing applications. Using precise natural precomputation and high-degree of parallelism prevalent in the modern computing environments, Polymath can make latency of secure polynomial evaluations of scalars and matrices independent of polynomial degree and matrix dimensions.</p> <p>We implement our protocols over the HoneyBadgerMPC library and apply it to two prominent secure computation tasks: privacy-preserving evaluation of decision trees and privacy-preserving evaluation of Markov processes. For the decision tree evaluation problem, we demonstrate the feasibility of evaluating high-depth decision tree models in a general <italic>n</italic>-party setting. For the Markov process application, we demonstrate that Poly-math can compute large powers of transition matrices with better online time and less communication.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00From “Onion Not Found” to Guard Discovery<abstract> <title style='display:none'>Abstract</title> <p>We present a novel web-based attack that identifies a Tor user’s guard in a matter of seconds. Our attack is low-cost, fast, and stealthy. It requires only a moderate amount of resources and can be deployed by website owners, third-party script providers, and malicious exits—if the website traffic is unencrypted. The attack works by injecting resources from non-existing onion service addresses into a webpage. Upon visiting the attack webpage with Tor Browser, the victim’s Tor client creates many circuits to look up the non-existing addresses. This allows middle relays controlled by the adversary to detect the distinctive traffic pattern of the “404 Not Found” lookups and identify the victim’s guard. We evaluate our attack with extensive simulations and live Tor network measurements, taking a range of victim machine, network, and geolocation configurations into account. We find that an adversary running a small number of HSDirs and providing 5 % of Tor’s relay bandwidth needs 12.06 seconds to identify the guards of 50 % of the victims, while it takes 22.01 seconds to discover 90 % of the victims’ guards. Finally, we evaluate a set of countermeasures against our attack including a defense that we develop based on a token bucket and the recently proposed Vanguards-lite defense in Tor.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00MLEFlow: Learning from History to Improve Load Balancing in Tor<abstract> <title style='display:none'>Abstract</title> <p>Tor has millions of daily users seeking privacy while browsing the Internet. It has thousands of relays to route users’ packets while anonymizing their sources and destinations. Users choose relays to forward their traffic according to probability distributions published by the <italic>Tor authorities</italic>. The authorities generate these probability distributions based on estimates of the capacities of the relays. They compute these estimates based on the bandwidths of probes sent to the relays. These estimates are necessary for better load balancing. Unfortunately, current methods fall short of providing accurate estimates leaving the network underutilized and its capacities unfairly distributed between the users’ paths. We present <italic>MLEFlow</italic>, a maximum likelihood approach for estimating relay capacities for optimal load balancing in Tor. We show that <italic>MLEFlow</italic> generalizes a version of Tor capacity estimation, <italic>TorFlow</italic>-<italic>P</italic>, by making better use of measurement history. We prove that the mean of our estimate converges to a small interval around the actual capacities, while the variance converges to zero. We present two versions of <italic>MLEFlow</italic>: <italic>MLEFlow</italic>-<italic>CF</italic>, a closed-form approximation of the MLE and <italic>MLEFlow</italic>-<italic>Q</italic>, a discretization and iterative approximation of the MLE which can account for noisy observations. We demonstrate the practical benefits of <italic>MLEFlow</italic> by simulating it using a flow-based Python simulator of a full Tor network and packet-based Shadow simulation of a scaled down version. In our simulations <italic>MLEFlow</italic> provides significantly more accurate estimates, which result in improved user performance, with median download speeds increasing by 30%.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00If This Then That : Exploring users’ concerns with IFTTT applets<abstract> <title style='display:none'>Abstract</title> <p>End users are increasingly using trigger-action platforms like <italic>If-This-Then-That</italic> (IFTTT) to create applets to connect smart-home devices and services. However, there are inherent implicit risks in using such applets—even non-malicious ones—as sensitive information may leak through their use in certain contexts (<italic>e.g.,</italic> where the device is located, who can observe the resultant action). This work aims to understand to what extent end users can assess this implicit risk. More importantly we explore whether usage context makes a difference in end-users’ perception of such risks. Our work complements prior work that has identified the impact of usage context on expert evaluation of risks in IFTTT by focusing the impact of usage context on end-users’ risk perception. Through a Mechanical Turk survey of 386 participants on 49 smart-home IFTTT applets, we found that participants have a nuanced view of contextual factors and that different values for contextual factors impact end-users’ risk perception differently. Further, our findings show that nudging the participants to think about different usage contexts led them to think deeper about the associated risks and raise their concern scores.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Personal information inference from voice recordings: User awareness and privacy concerns<abstract> <title style='display:none'>Abstract</title> <p>Through voice characteristics and manner of expression, even seemingly benign voice recordings can reveal sensitive attributes about a recorded speaker (e. g., geographical origin, health status, personality). We conducted a nationally representative survey in the UK (n = 683, 18–69 years) to investigate people’s awareness about the inferential power of voice and speech analysis. Our results show that – while awareness levels vary between different categories of inferred information – there is generally low awareness across all participant demographics, even among participants with professional experience in computer science, data mining, and IT security. For instance, only 18.7% of participants are at least somewhat aware that physical and mental health information can be inferred from voice recordings. Many participants have rarely (28.4%) or never (42.5%) even thought about the possibility of personal information being inferred from speech data. After a short educational video on the topic, participants express only moderate privacy concern. However, based on an analysis of open text responses, unconcerned reactions seem to be largely explained by knowledge gaps about possible data misuses. Watching the educational video lowered participants’ intention to use voice-enabled devices. In discussing the regulatory implications of our findings, we challenge the notion of “informed consent” to data processing. We also argue that inferences about individuals need to be legally recognized as personal data and protected accordingly.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Disparate Vulnerability to Membership Inference Attacks<abstract> <title style='display:none'>Abstract</title> <p>A membership inference attack (MIA) against a machine-learning model enables an attacker to determine whether a given data record was part of the model’s training data or not. In this paper, we provide an in-depth study of the phenomenon of <italic>disparate vulnerability</italic> against MIAs: unequal success rate of MIAs against different population subgroups. We first establish necessary and sufficient conditions for MIAs to be prevented, both on average and for population subgroups, using a notion of distributional generalization. Second, we derive connections of disparate vulnerability to algorithmic fairness and to differential privacy. We show that fairness can only prevent disparate vulnerability against limited classes of adversaries. Differential privacy bounds disparate vulnerability but can significantly reduce the accuracy of the model. We show that estimating disparate vulnerability by naïvely applying existing attacks can lead to overestimation. We then establish which attacks are suitable for estimating disparate vulnerability, and provide a statistical framework for doing so reliably. We conduct experiments on synthetic and real-world data finding significant evidence of disparate vulnerability in realistic settings.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Circuit-PSI With Linear Complexity via Relaxed Batch OPPRF<abstract> <title style='display:none'>Abstract</title> <p>In 2-party Circuit-based Private Set Intersection (Circuit-PSI), <italic>P</italic><sub>0</sub> and <italic>P</italic><sub>1</sub> hold sets S<sub>0</sub> and S<sub>1</sub> respectively and wish to securely compute a function <italic>f</italic> over the set S<sub>0</sub> ∩ S<sub>1</sub> (e.g., cardinality, sum over associated attributes, or threshold intersection). Following a long line of work, Pinkas <italic>et al.</italic> (PSTY, Eurocrypt 2019) showed how to construct a concretely efficient Circuit-PSI protocol with linear communication complexity. However, their protocol requires super-linear computation.</p> <p>In this work, we construct concretely efficient Circuit-PSI protocols with linear computational and communication cost. Further, our protocols are more performant than the state-of-the-art, PSTY – we are ≈ 2.3<italic>×</italic> more communication efficient and are up to 2.8<italic>×</italic> faster. We obtain our improvements through a new primitive called <italic>Relaxed Batch Oblivious Programmable Pseudorandom Functions</italic> (RB-OPPRF) that can be seen as a strict generalization of Batch OPPRFs that were used in PSTY. This primitive could be of independent interest.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Differentially private partition selection<abstract> <title style='display:none'>Abstract</title> <p>Many data analysis operations can be expressed as a GROUP BY query on an unbounded set of partitions, followed by a per-partition aggregation. To make such a query differentially private, adding noise to each aggregation is not enough: we also need to make sure that the set of partitions released is also differentially private.</p> <p>This problem is not new, and it was recently formally introduced as <italic>differentially private set union</italic> [14]. In this work, we continue this area of study, and focus on the common setting where each user is associated with a single partition. In this setting, we propose a simple, <italic>optimal</italic> differentially private mechanism that maximizes the number of released partitions. We discuss implementation considerations, as well as the possible extension of this approach to the setting where each user contributes to a fixed, small number of partitions.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Toward Uncensorable, Anonymous and Private Access Over Satoshi Blockchains<abstract> <title style='display:none'>Abstract</title> <p>Providing unrestricted access to sensitive content such as news and software is difficult in the presence of adaptive and resourceful surveillance and censoring adversaries. In this paper we leverage the distributed and resilient nature of commercial Satoshi blockchains to develop the first provably secure, censorship resistant, cost-efficient storage system with anonymous and private access, built on top of commercial cryptocurrency transactions. We introduce max-rate transactions, a practical construct to persist data of arbitrary size entirely in a Satoshi blockchain. We leverage max-rate transactions to develop UWeb, a blockchain-based storage system that charges publishers to self-sustain its decentralized infrastructure. UWeb organizes blockchain-stored content for easy retrieval, and enables clients to store and access content with provable anonymity, privacy and censorship resistance properties.</p> <p>We present results from UWeb experiments with writing 268.21 MB of data into the live Litecoin blockchain, including 4.5 months of live-feed BBC articles, and 41 censorship resistant tools. The max-rate writing throughput (183 KB/s) and blockchain utilization (88%) exceed those of state-of-the-art solutions by 2-3 orders of magnitude and broke Litecoin’s record of the daily average block size. Our simulations with up to 3,000 concurrent UWeb writers confirm that UWeb does not impact the confirmation delays of financial transactions.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Privacy-preserving FairSwap: Fairness and privacy interplay<abstract> <title style='display:none'>Abstract</title> <p>Fair exchange protocols are among the most important cryptographic primitives in electronic commerce. A basic fair exchange protocol requires that two parties who want to exchange their digital items either receive what they have been promised, or lose nothing. Privacy of fair exchange requires that no one else (other than the two parties) learns anything about the items. Fairness and privacy have been considered as two distinct properties of an exchange protocol. In this paper, we show that subtle ways of leaking the exchange item to the third parties affect fairness in fair exchange protocols when the item is confidential. Our focus is on Fair-Swap, a recently proposed fair exchange protocol that uses a smart contract for dispute resolution, has proven security in UC (Universal Composability) framework, and provides privacy when both parties are honest. We demonstrate, however, that FairSwap’s dispute resolution protocol leaks information to the public and this leakage provides opportunities for the dishonest parties to influence the protocol’s fairness guarantee. We then propose an efficient privacy-enhanced version of Fair-Swap, prove its security and give an implementation and performance evaluation of our proposed system. Our privacy enhancement uses circuit randomization, and we prove its security and privacy in an extension of universal composability model for non-monolithic adversaries that would be of independent interest.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Privacy-Preserving High-dimensional Data Collection with Federated Generative Autoencoder<abstract> <title style='display:none'>Abstract</title> <p>Business intelligence and AI services often involve the collection of copious amounts of multidimensional personal data. Since these data usually contain sensitive information of individuals, the direct collection can lead to privacy violations. Local differential privacy (LDP) is currently considered a state-ofthe-art solution for privacy-preserving data collection. However, existing LDP algorithms are not applicable to high-dimensional data; not only because of the increase in computation and communication cost, but also poor data utility.</p> <p>In this paper, we aim at addressing the <italic>curse-of-dimensionality</italic> problem in LDP-based high-dimensional data collection. Based on the idea of machine learning and data synthesis, we propose DP-F<sc>ed</sc>-W<sc>ae</sc>, an efficient privacy-preserving framework for collecting high-dimensional categorical data. With the combination of a generative autoencoder, federated learning, and differential privacy, our framework is capable of privately learning the statistical distributions of local data and generating high utility synthetic data on the server side without revealing users’ private information. We have evaluated the framework in terms of data utility and privacy protection on a number of real-world datasets containing 68–124 classification attributes. We show that our framework outperforms the LDP-based baseline algorithms in capturing joint distributions and correlations of attributes and generating high-utility synthetic data. With a local privacy guarantee ∈ = 8, the machine learning models trained with the synthetic data generated by the baseline algorithm cause an accuracy loss of 10% ~ 30%, whereas the accuracy loss is significantly reduced to less than 3% and at best even less than 1% with our framework. Extensive experimental results demonstrate the capability and efficiency of our framework in synthesizing high-dimensional data while striking a satisfactory utility-privacy balance.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Towards Improving Code Stylometry Analysis in Underground Forums<abstract> <title style='display:none'>Abstract</title> <p>Code Stylometry has emerged as a powerful mechanism to identify programmers. While there have been significant advances in the field, existing mechanisms underperform in challenging domains. One such domain is studying the provenance of code shared in underground forums, where code posts tend to have small or incomplete source code fragments. This paper proposes a method designed to deal with the idiosyncrasies of code snippets shared in these forums. Our system fuses a forum-specific learning pipeline with Conformal Prediction to generate predictions with precise confidence levels as a novelty. We see that identifying unreliable code snippets is paramount to generate high-accuracy predictions, and this is a task where traditional learning settings fail. Overall, our method performs as twice as well as the state-of-the-art in a constrained setting with a large number of authors (i.e., 100). When dealing with a smaller number of authors (i.e., 20), it performs at high accuracy (89%). We also evaluate our work on an open-world assumption and see that our method is more effective at retaining samples.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Zen and the art of model adaptation: Low-utility-cost attack mitigations in collaborative machine learning<abstract> <title style='display:none'>Abstract</title> <p>In this study, we aim to bridge the gap between the theoretical understanding of attacks against collaborative machine learning workflows and their practical ramifications by considering the effects of model architecture, learning setting and hyperparameters on the resilience against attacks. We refer to such mitigations as <italic>model adaptation</italic>. Through extensive experimentation on both, benchmark and real-life datasets, we establish a more practical threat model for collaborative learning scenarios. In particular, we evaluate the impact of model adaptation by implementing a range of attacks belonging to the broader categories of model inversion and membership inference. Our experiments yield two noteworthy outcomes: they demonstrate the difficulty of actually conducting successful attacks under realistic settings when model adaptation is employed and they highlight the challenge inherent in successfully combining model adaptation and formal privacy-preserving techniques to retain the optimal balance between model utility and attack resilience.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Ulixes: Facial Recognition Privacy with Adversarial Machine Learning<abstract> <title style='display:none'>Abstract</title> <p>Facial recognition tools are becoming exceptionally accurate in identifying people from images. However, this comes at the cost of privacy for users of online services with photo management (e.g. social media platforms). Particularly troubling is the ability to leverage unsupervised learning to recognize faces even when the user has not labeled their images. In this paper we propose Ulixes, a strategy to generate visually non-invasive facial noise masks that yield adversarial examples, preventing the formation of identifiable user clusters in the embedding space of facial encoders. This is applicable even when a user is unmasked and labeled images are available online. We demonstrate the effectiveness of Ulixes by showing that various classification and clustering methods cannot reliably label the adversarial examples we generate. We also study the effects of Ulixes in various black-box settings and compare it to the current state of the art in adversarial machine learning. Finally, we challenge the effectiveness of Ulixes against adversarially trained models and show that it is robust to countermeasures.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00Making the Most of Parallel Composition in Differential Privacy<abstract> <title style='display:none'>Abstract</title> <p>We show that the ‘optimal’ use of the parallel composition theorem corresponds to finding the size of the largest subset of queries that ‘overlap’ on the data domain, a quantity we call the <italic>maximum overlap</italic> of the queries. It has previously been shown that a certain instance of this problem, formulated in terms of determining the sensitivity of the queries, is NP-hard, but also that it is possible to use graph-theoretic algorithms, such as finding the maximum clique, to approximate query sensitivity. In this paper, we consider a significant generalization of the aforementioned instance which encompasses both a wider range of differentially private mechanisms and a broader class of queries. We show that for a particular class of predicate queries, determining if they are disjoint can be done in time polynomial in the number of attributes. For this class, we show that the maximum overlap problem remains NP-hard as a function of the number of queries. However, we show that efficient approximate solutions exist by relating maximum overlap to the clique and chromatic numbers of a certain graph determined by the queries. The link to chromatic number allows us to use more efficient approximate algorithms, which cannot be done for the clique number as it may underestimate the privacy budget. Our approach is defined in the general setting of <italic>f</italic>-differential privacy, which subsumes standard pure differential privacy and Gaussian differential privacy. We prove the parallel composition theorem for <italic>f</italic>-differential privacy. We evaluate our approach on synthetic and real-world data sets of queries. We show that the approach can scale to large domain sizes (up to 10<sup>20000</sup>), and that its application can reduce the noise added to query answers by up to 60%.</p> </abstract>ARTICLE2021-11-20T00:00:00.000+00:00en-us-1