rss_2.0Proceedings on Privacy Enhancing Technologies FeedSciendo RSS Feed for Proceedings on Privacy Enhancing Technologies on Privacy Enhancing Technologies 's Cover“We, three brothers have always known everything of each other”: A Cross-cultural Study of Sharing Digital Devices and Online Accounts<abstract> <title style='display:none'>Abstract</title> <p>Although many technologies assume that a device or an account would be used by a single user, prior research has found that this assumption may not hold true in everyday life. Most studies conducted to date focused on sharing a device or account with the members in a household. However, there is a dearth in existing literature to understand the contexts of sharing devices and accounts, which may extend to a wide range of personal, social, and professional settings. Further, people’s sharing behavior could be impacted by their social background. To this end, our paper presents a qualitative study with 59 participants from three different countries: Bangladesh, Turkey, and USA, where we investigated the sharing of digital devices (e.g., computer, mobile phone) and online accounts, in particular, financial and identity accounts (e.g., email, social networking) in various contexts, and with different entities - not limited to the members in a household. Our study reveals users’ perceptions of risks while sharing a device or account, and their access control strategies to protect privacy and security. Based on our analysis, we shed light on the interplay between users’ sharing behavior and their demographics, social background, and cultural values. Taken together, our findings have broad implications that advance the PETS community’s situated understanding of sharing devices and accounts.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Residue-Free Computing<abstract> <title style='display:none'>Abstract</title> <p>Computer applications often leave traces or <italic>residues</italic> that enable forensic examiners to gain a detailed understanding of the actions a user performed on a computer. Such digital breadcrumbs are left by a large variety of applications, potentially (and indeed likely) unbeknownst to their users. This paper presents the concept of <italic>residue-free computing</italic> in which a user can operate any existing application installed on their computer in a mode that prevents trace data from being recorded to disk, thus frustrating the forensic process and enabling more privacy-preserving computing. In essence, residue-free computing provides an “incognito mode” for <italic>any</italic> application. We introduce our implementation of residue-free computing, R<sc>esidue</sc>F<sc>ree</sc>, and motivate R<sc>esidue</sc>F<sc>ree</sc> by inventorying the potentially sensitive and privacy-invasive residue left by popular applications. We demonstrate that R<sc>esidue</sc>F<sc>ree</sc> allows users to operate these applications without leaving trace data, while incurring modest performance overheads.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Domain name encryption is not enough: privacy leakage via IP-based website fingerprinting<abstract> <title style='display:none'>Abstract</title> <p>Although the security benefits of domain name encryption technologies such as DNS over TLS (DoT), DNS over HTTPS (DoH), and Encrypted Client Hello (ECH) are clear, their positive impact on user privacy is weakened by—the still exposed—IP address information. However, content delivery networks, DNS-based load balancing, co-hosting of different websites on the same server, and IP address churn, all contribute towards making domain–IP mappings unstable, and prevent straightforward IP-based browsing tracking.</p> <p>In this paper, we show that this instability is not a roadblock (assuming a universal DoT/DoH and ECH deployment), by introducing an IP-based website finger-printing technique that allows a network-level observer to identify <italic>at scale</italic> the website a user visits. Our technique exploits the complex structure of most websites, which load resources from several domains besides their primary one. Using the generated fingerprints of more than 200K websites studied, we could successfully identify 84% of them when observing solely destination IP addresses. The accuracy rate increases to 92% for popular websites, and 95% for popular <italic>and</italic> sensitive web-sites. We also evaluated the robustness of the generated fingerprints over time, and demonstrate that they are still effective at successfully identifying about 70% of the tested websites after two months. We conclude by discussing strategies for website owners and hosting providers towards hindering IP-based website fingerprinting and maximizing the privacy benefits offered by DoT/DoH and ECH.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Mercurial Signatures for Variable-Length Messages<abstract> <title style='display:none'>Abstract</title> <p>Mercurial signatures are a useful building block for privacy-preserving schemes, such as anonymous credentials, delegatable anonymous credentials, and related applications. They allow a signature <italic>σ</italic> on a message <italic>m</italic> under a public key pk to be transformed into a signature <italic>σ</italic>′ on an <italic>equivalent</italic> message <italic>m</italic>′ under an equivalent public key pk′ for an appropriate notion of equivalence. For example, pk and pk′ may be unlinkable pseudonyms of the same user, and <italic>m</italic> and <italic>m</italic>′ may be unlinkable pseudonyms of a user to whom some capability is delegated. The only previously known construction of mercurial signatures suffers a severe limitation: in order to sign messages of length <italic>ℓ</italic>, the signer’s public key must also be of length <italic>ℓ</italic>. In this paper, we eliminate this restriction and provide an interactive signing protocol that admits messages of any length. We prove our scheme existentially unforgeable under chosen open message attacks (EUF-CoMA) under a variant of the asymmetric bilinear decisional Diffie-Hellman assumption (ABDDH).</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Fortified Multi-Party Computation: Taking Advantage of Simple Secure Hardware Modules<abstract> <title style='display:none'>Abstract</title> <p>In practice, there are numerous settings where mutually distrusting parties need to perform distributed computations on their private inputs. For instance, participants in a first-price sealed-bid online auction do not want their bids to be disclosed. This problem can be addressed using secure multi-party computation (MPC), where parties can evaluate a publicly known function on their private inputs by executing a specific protocol that only reveals the correct output, but nothing else about the private inputs. Such distributed computations performed over the Internet are susceptible to remote hacks that may take place during the computation. As a consequence, sensitive data such as private bids may leak. All existing MPC protocols do not provide any protection against the consequences of such remote hacks. We present the first MPC protocols that protect the remotely hacked parties’ inputs and outputs from leaking. More specifically, unless the remote hack takes place before the party received its input or all parties are corrupted, a hacker is unable to learn the parties’ inputs and outputs, and is also unable to modify them. We achieve these strong (privacy) guarantees by utilizing the fact that in practice parties may not be susceptible to remote attacks at every point in time, but only while they are online, i.e. able to receive messages. To this end, we model communication via explicit channels. In particular, we introduce channels with an <italic>airgap switch</italic> (disconnect-able by the party in control of the switch), and unidirectional <italic>data diodes</italic>. These channels and their isolation properties, together with very few, similarly simple and plausibly remotely unhackable hardware modules serve as the main ingredient for attaining such strong security guarantees. In order to formalize these strong guarantees, we propose the <italic>UC with Fortified Security</italic> (UC#) framework, a variant of the Universal Composability (UC) framework.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Privacy-Preserving Approximate -Nearest-Neighbors Search that Hides Access, Query and Volume Patterns<abstract> <title style='display:none'>Abstract</title> <p>We study the problem of privacy-preserving approximate kNN search in an outsourced environment — the client sends the encrypted data to an untrusted server and later can perform secure approximate kNN search and updates. We design a security model and propose a generic construction based on locality-sensitive hashing, symmetric encryption, and an oblivious map. The construction provides very strong security guarantees, not only hiding the information about the data, but also the access, query, and volume patterns. We implement, evaluate efficiency, and compare the performance of two concrete schemes based on an oblivious AVL tree and an oblivious BSkiplist.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Differentially Private Naïve Bayes Classifier Using Smooth Sensitivity<abstract> <title style='display:none'>Abstract</title> <p>There is increasing awareness of the need to protect individual privacy in the training data used to develop machine learning models. Differential Privacy is a strong concept of protecting individuals. Naïve Bayes is a popular machine learning algorithm, used as a baseline for many tasks. In this work, we have provided a differentially private Naïve Bayes classifier that adds noise proportional to the <italic>smooth sensitivity</italic> of its parameters. We compare our results to Vaidya, Shafiq, Basu, and Hong [1] which scales noise to the global sensitivity of the parameters. Our experimental results on real-world datasets show that smooth sensitivity significantly improves accuracy while still guaranteeing <italic>ɛ</italic>-differential privacy.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Less is More: A privacy-respecting Android malware classifier using federated learning<abstract> <title style='display:none'>Abstract</title> <p>In this paper we present LiM (‘Less is More’), a malware classification framework that leverages Federated Learning to detect and classify malicious apps in a privacy-respecting manner. Information about newly installed apps is kept locally on users’ devices, so that the provider cannot infer which apps were installed by users. At the same time, input from all users is taken into account in the federated learning process and they all benefit from better classification performance. A key challenge of this setting is that users do not have access to the ground truth (i.e. they cannot correctly identify whether an app is malicious). To tackle this, LiM uses a safe semi-supervised ensemble that maximizes classification accuracy with respect to a baseline classifier trained by the service provider (i.e. the cloud). We implement LiM and show that the cloud server has F1 score of 95%, while clients have perfect recall with only 1 false positive in <italic>&gt;</italic> 100 apps, using a dataset of 25K clean apps and 25K malicious apps, 200 users and 50 rounds of federation. Furthermore, we conduct a security analysis and demonstrate that LiM is robust against both poisoning attacks by adversaries who control half of the clients, and inference attacks performed by an honest-but-curious cloud server. Further experiments with Ma-MaDroid’s dataset confirm resistance against poisoning attacks and a performance improvement due to the federation.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Blocking Without Breaking: Identification and Mitigation of Non-Essential IoT Traffic<abstract> <title style='display:none'>Abstract</title> <p>Despite the prevalence of Internet of Things (IoT) devices, there is little information about the purpose and risks of the Internet traffic these devices generate, and consumers have limited options for controlling those risks. A key open question is whether one can mitigate these risks by automatically blocking some of the Internet connections from IoT devices, without rendering the devices inoperable.</p> <p>In this paper, we address this question by developing a rigorous methodology that relies on automated IoT-device experimentation to reveal which network connections (and the information they expose) are essential, and which are not. We further develop strategies to <italic>automatically</italic> classify network traffic destinations as either required (<italic>i.e.</italic>, their traffic is <italic>essential</italic> for devices to work properly) or not, hence allowing firewall rules to block traffic sent to non-required destinations without breaking the functionality of the device. We find that indeed 16 among the 31 devices we tested have at least one blockable non-required destination, with the maximum number of blockable destinations for a device being 11. We further analyze the destination of network traffic and find that all third parties observed in our experiments are blockable, while first and support parties are neither uniformly required or non-required. Finally, we demonstrate the limitations of existing blocklists on IoT traffic, propose a set of guidelines for automatically limiting non-essential IoT traffic, and we develop a prototype system that implements these guidelines.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Editors’ Introduction Efficient Privacy-preserving Clustering<abstract> <title style='display:none'>Abstract</title> <p>Clustering is a popular unsupervised machine learning technique that groups similar input elements into clusters. It is used in many areas ranging from business analysis to health care. In many of these applications, sensitive information is clustered that should not be leaked. Moreover, nowadays it is often required to combine data from multiple sources to increase the quality of the analysis as well as to outsource complex computation to powerful cloud servers. This calls for efficient privacy-preserving clustering. In this work, we systematically analyze the state-of-the-art in privacy-preserving clustering. We implement and benchmark today’s four most efficient fully private clustering protocols by Cheon et al. (SAC’19), Meng et al. (ArXiv’19), Mohassel et al. (PETS’20), and Bozdemir et al. (ASIACCS’21) with respect to communication, computation, and clustering quality. We compare them, assess their limitations for a practical use in real-world applications, and conclude with open challenges.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Private Stream Aggregation with Labels in the Standard Model<abstract> <title style='display:none'>Abstract</title> <p>A private stream aggregation (PSA) scheme is a protocol of <italic>n</italic> clients and one aggregator. At every time step, the clients send an encrypted value to the (untrusted) aggregator, who is able to compute the sum of all client values, but cannot learn the values of individual clients. One possible application of PSA is privacy-preserving smart-metering, where a power supplier can learn the total power consumption, but not the consumption of individual households. We construct a simple PSA scheme that supports labels and which we prove to be secure in the standard model. Labels are useful to restrict the access of the aggregator, because it prevents the aggregator from combining ciphertexts with different labels (or from different time-steps) and thus avoids leaking information about values of individual clients. The scheme is based on key-homomorphic pseudorandom functions (PRFs) as the only primitive, supports a large message space, scales well for a large number of users and has small ciphertexts. We provide an implementation of the scheme with a lattice-based key-homomorphic PRF (secure in the ROM) and measure the performance of the implementation. Furthermore, we discuss practical issues such as how to avoid a trusted party during the setup and how to cope with clients joining or leaving the system.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Privacy Preference Signals: Past, Present and Future<abstract> <title style='display:none'>Abstract</title> <p>Privacy preference signals are digital representations of how users want their personal data to be processed. Such signals must be adopted by both the sender (users) and intended recipients (data processors). Adoption represents a coordination problem that remains unsolved despite efforts dating back to the 1990s. Browsers implemented standards like the Platform for Privacy Preferences (P3P) and Do Not Track (DNT), but vendors profiting from personal data faced few incentives to receive and respect the expressed wishes of data subjects. In the wake of recent privacy laws, a coalition of AdTech firms published the Transparency and Consent Framework (TCF), which defines an optin consent signal. This paper integrates post-GDPR developments into the wider history of privacy preference signals. Our main contribution is a high-frequency longitudinal study describing how TCF signal gained dominance as of February 2021. We explore which factors correlate with adoption at the website level. Both the number of third parties on a website and the presence of Google Ads are associated with higher adoption of TCF. Further, we show that vendors acted as early adopters of TCF 2.0 and provide two case-studies describing how Consent Management Providers shifted existing customers to TCF 2.0. We sketch ways forward for a pro-privacy signal.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Supervised Authorship Segmentation of Open Source Code Projects<abstract> <title style='display:none'>Abstract</title> <p>Source code authorship attribution can be used for many types of intelligence on binaries and executables, including forensics, but introduces a threat to the privacy of anonymous programmers. Previous work has shown how to attribute individually authored code files and code segments. In this work, we examine authorship segmentation, in which we determine authorship of arbitrary parts of a program. While previous work has performed segmentation at the textual level, we attempt to attribute subtrees of the abstract syntax tree (AST). We focus on two primary problems: identifying the primary author of an arbitrary AST subtree and identifying on which edges of the AST primary authorship changes. We demonstrate that the former is a difficult problem but the later is much easier. We also demonstrate methods by which we can leverage the easier problem to improve accuracy for the harder problem. We show that while identifying the author of subtrees is difficult overall, this is primarily due to the abundance of small subtrees: in the validation set we can attribute subtrees of at least 25 nodes with accuracy over 80% and at least 33 nodes with accuracy over 90%, while in the test set we can attribute subtrees of at least 33 nodes with accuracy of 70%. While our baseline accuracy for single AST nodes is 20.21% for the validation set and 35.66% for the test set, we present techniques by which we can increase this accuracy to 42.01% and 49.21% respectively. We further present observations about collaborative code found on GitHub that may drive further research.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00You May Also Like... Privacy: Recommendation Systems Meet PIR<abstract> <title style='display:none'>Abstract</title> <p>We describe the design, analysis, implementation, and evaluation of P<sc>irsona</sc>, a digital content delivery system that realizes collaborative-filtering recommendations atop private information retrieval (PIR). This combination of seemingly antithetical primitives makes possible—for the first time—the construction of practically efficient e-commerce and digital media delivery systems that can provide personalized content recommendations based on their users’ historical consumption patterns while simultaneously keeping said consumption patterns private. In designing P<sc>irsona</sc>, we have opted for the most performant primitives available (at the expense of rather strong non-collusion assumptions); namely, we use the recent computationally 1-private PIR protocol of Hafiz and Henry (PETS 2019.4) together with a carefully optimized 4PC Boolean matrix factorization.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00SwapCT: Swap Confidential Transactions for Privacy-Preserving Multi-Token Exchanges<abstract> <title style='display:none'>Abstract</title> <p>Decentralized token exchanges allow for secure trading of tokens without a trusted third party. However, decentralization is mostly achieved at the expense of transaction privacy. For a fair exchange, transactions must remain private to hide the participants and volumes while maintaining the possibility for noninteractive execution of trades. In this paper we present a swap confidential transaction system (SwapCT) which is related to ring confidential transactions (e.g. used in Monero) but supports multiple token types to trade among and enables secure, partial transactions for noninteractive swaps. We prove that SwapCT is secure in a strict, formal model and present its efficient performance in a prototype implementation with logarithmic signature sizes for large anonymity sets. For our construction we design an aggregatable signature scheme which might be of independent interest. Our SwapCT system thereby enables a secure and private exchange for tokens without a trusted third party.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00ZKSENSE: A Friction-less Privacy-Preserving Human Attestation Mechanism for Mobile Devices<abstract> <title style='display:none'>Abstract</title> <p>Recent studies show that 20.4% of the internet traffic originates from automated agents. To identify and block such ill-intentioned traffic, mechanisms that <italic>verify the humanness of the user</italic> are widely deployed, with CAPTCHAs being the most popular. Traditional CAPTCHAs require extra user effort (e.g., solving mathematical puzzles), which can severely downgrade the end-user’s experience, especially on mobile, and provide sporadic humanness verification of questionable accuracy. More recent solutions like Google’s reCAPTCHA v3, leverage user data, thus raising significant privacy concerns. To address these issues, we present zkSENSE: the first zero-knowledge proof-based humanness attestation system for mobile devices. zkSENSE moves the human attestation to the edge: onto the user’s very own device, where humanness of the user is assessed in a privacy-preserving and seamless manner. zkSENSE achieves this by classifying motion sensor outputs of the mobile device, based on a model trained by using both publicly available sensor data and data collected from a small group of volunteers. To ensure the integrity of the process, the classification result is enclosed in a zero-knowledge proof of humanness that can be safely shared with a remote server. We implement zkSENSE as an Android service to demonstrate its effectiveness and practicality. In our evaluation, we show that zkSENSE successfully verifies the humanness of a user across a variety of attacking scenarios and demonstrate 92% accuracy. On a two years old Samsung S9, zkSENSE’s attestation takes around 3 seconds (when visual CAPTCHAs need 9.8 seconds) and consumes a negligible amount of battery.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Unifying Privacy Policy Detection<abstract> <title style='display:none'>Abstract</title> <p>Privacy policies have become a focal point of privacy research. With their goal to reflect the privacy practices of a website, service, or app, they are often the starting point for researchers who analyze the accuracy of claimed data practices, user understanding of practices, or control mechanisms for users. Due to vast differences in structure, presentation, and content, it is often challenging to extract privacy policies from online resources like websites for analysis. In the past, researchers have relied on scrapers tailored to the specific analysis or task, which complicates comparing results across different studies.</p> <p>To unify future research in this field, we developed a toolchain to process website privacy policies and prepare them for research purposes. The core part of this chain is a detector module for English and German, using natural language processing and machine learning to automatically determine whether given texts are privacy or cookie policies. We leverage multiple existing data sets to refine our approach, evaluate it on a recently published longitudinal corpus, and show that it contains a number of misclassified documents. We believe that unifying data preparation for the analysis of privacy policies can help make different studies more comparable and is a step towards more thorough analyses. In addition, we provide insights into common pitfalls that may lead to invalid analyses.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00Gage MPC: Bypassing Residual Function Leakage for Non-Interactive MPC<abstract> <title style='display:none'>Abstract</title> <p>Existing models for non-interactive MPC cannot provide full privacy for inputs, because they inherently leak the residual function (i.e., the output of the function on the honest parties’ input together with all possible values of the adversarial inputs). For example, in any non-interactive sealed-bid auction, the last bidder can figure out what was the highest previous bid. We present a new MPC model which avoids this privacy leak. To achieve this, we utilize a blockchain in a novel way, incorporating smart contracts and arbitrary parties that can be incentivized to perform computation (“bounty hunters,” akin to miners). Security is maintained under a monetary assumption about the parties: an honest party can temporarily supply a recoverable collateral of value higher than the computational cost an adversary can expend. We thus construct non-interactive MPC protocols with strong security guarantees (full security, no residual leakage) in the short term. Over time, as the adversary can invest more and more computational resources, the security guarantee decays. Thus, our model, which we call Gage MPC, is suitable for secure computation with limited-time secrecy, such as auctions. A key ingredient in our protocols is a primitive we call “Gage Time Capsules” (GaTC): a time capsule that allows a party to commit to a value that others are able to reveal but only at a designated computational cost. A GaTC allows a party to commit to a value together with a monetary collateral. If the original party properly opens the GaTC, it can recover the collateral. Otherwise, the collateral is used to incentivize bounty hunters to open the GaTC. This primitive is used to ensure completion of Gage MPC protocols on the desired inputs. As a requisite tool (of independent interest), we present a generalization of garbled circuit that are more robust: they can tolerate exposure of extra input labels. This is in contrast to Yao’s garbled circuits, whose secrecy breaks down if even a single extra label is exposed. Finally, we present a proof-of-concept implementation of a special case of our construction, yielding an auction functionality over an Ethereum-like blockchain.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00SoK: Privacy-Preserving Computation Techniques for Deep Learning<abstract> <title style='display:none'>Abstract</title> <p>Deep Learning (DL) is a powerful solution for complex problems in many disciplines such as finance, medical research, or social sciences. Due to the high computational cost of DL algorithms, data scientists often rely upon Machine Learning as a Service (MLaaS) to outsource the computation onto third-party servers. However, outsourcing the computation raises privacy concerns when dealing with sensitive information, e.g., health or financial records. Also, privacy regulations like the European GDPR limit the collection, distribution, and use of such sensitive data. Recent advances in privacy-preserving computation techniques (i.e., Homomorphic Encryption and Secure Multiparty Computation) have enabled DL training and inference over protected data. However, these techniques are still immature and difficult to deploy in practical scenarios. In this work, we review the evolution of the adaptation of privacy-preserving computation techniques onto DL, to understand the gap between research proposals and practical applications. We highlight the relative advantages and disadvantages, considering aspects such as efficiency shortcomings, reproducibility issues due to the lack of standard tools and programming interfaces, or lack of integration with DL frameworks commonly used by the data science community.</p> </abstract>ARTICLE2021-07-23T00:00:00.000+00:00en-us-1