1. bookVolumen 2022 (2022): Heft 2 (April 2022)
Zeitschriftendaten
License
Format
Zeitschrift
eISSN
2299-0984
Erstveröffentlichung
16 Apr 2015
Erscheinungsweise
4 Hefte pro Jahr
Sprachen
Englisch
access type Uneingeschränkter Zugang

Visualizing Privacy-Utility Trade-Offs in Differentially Private Data Releases

Online veröffentlicht: 03 Mar 2022
Volumen & Heft: Volumen 2022 (2022) - Heft 2 (April 2022)
Seitenbereich: 601 - 618
Eingereicht: 31 Aug 2021
Akzeptiert: 16 Dec 2021
Zeitschriftendaten
License
Format
Zeitschrift
eISSN
2299-0984
Erstveröffentlichung
16 Apr 2015
Erscheinungsweise
4 Hefte pro Jahr
Sprachen
Englisch
Abstract

Organizations often collect private data and release aggregate statistics for the public’s benefit. If no steps toward preserving privacy are taken, adversaries may use released statistics to deduce unauthorized information about the individuals described in the private dataset. Differentially private algorithms address this challenge by slightly perturbing underlying statistics with noise, thereby mathematically limiting the amount of information that may be deduced from each data release. Properly calibrating these algorithms—and in turn the disclosure risk for people described in the dataset—requires a data curator to choose a value for a privacy budget parameter, ɛ. However, there is little formal guidance for choosing ɛ, a task that requires reasoning about the probabilistic privacy–utility tradeoff. Furthermore, choosing ɛ in the context of statistical inference requires reasoning about accuracy trade-offs in the presence of both measurement error and differential privacy (DP) noise.

We present Visualizing Privacy (ViP), an interactive interface that visualizes relationships between ɛ, accuracy, and disclosure risk to support setting and splitting ɛ among queries. As a user adjusts ɛ, ViP dynamically updates visualizations depicting expected accuracy and risk. ViP also has an inference setting, allowing a user to reason about the impact of DP noise on statistical inferences. Finally, we present results of a study where 16 research practitioners with little to no DP background completed a set of tasks related to setting ɛ using both ViP and a control. We find that ViP helps participants more correctly answer questions related to judging the probability of where a DP-noised release is likely to fall and comparing between DP-noised and non-private confidence intervals.

[1] Abowd, J. M. (2018). The US Census Bureau adopts differential privacy. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (pp. 2867–2867). Search in Google Scholar

[2] Aktay, A., Bavadekar, S., Cossoul, G., Davis, J., Desfontaines, D., Fabrikant, A., . . . others (2020). Google COVID-19 Community Mobility Reports: anonymization process description (version 1.1). arXiv preprint arXiv:2004.04145. Search in Google Scholar

[3] Almasi, M. M., Siddiqui, T. R., Mohammed, N., & Hemmati, H. (2016). The risk-utility tradeoff for data privacy models. In 2016 8th IFIP International Conference on New Technologies, Mobility and Security (NTMS) (pp. 1–5).10.1109/NTMS.2016.7792481 Search in Google Scholar

[4] Assistive AI Makes Replying Easier. (2020). Retrieved from https://www.microsoft.com/en-us/research/group/msai/articles/assistive-ai-makes-replying-easier-2/ Search in Google Scholar

[5] Bavadekar, S., Boulanger, A., Davis, J., Desfontaines, D., Gabrilovich, E., Gadepalli, K., . . . others (2021). Google COVID-19 Vaccination Search Insights: Anonymization Process Description. arXiv preprint arXiv:2107.01179. Search in Google Scholar

[6] Bavadekar, S., Dai, A., Davis, J., Desfontaines, D., Eckstein, I., Everett, K., . . . others (2020). Google COVID-19 Search Trends Symptoms Dataset: Anonymization Process Description (version 1.0). arXiv preprint arXiv:2009.01265. Search in Google Scholar

[7] Biswas, S., Dong, Y., Kamath, G., & Ullman, J. (2020). Coinpress: Practical private mean and covariance estimation. arXiv preprint arXiv:2006.06618. Search in Google Scholar

[8] Bittner, D. M., Brito, A. E., Ghassemi, M., Rane, S., Sarwate, A. D., & Wright, R. N. (2020). Understanding Privacy-Utility Tradeoffs in Differentially Private Online Active Learning. Journal of Privacy and Confidentiality, 10(2).10.29012/jpc.720 Search in Google Scholar

[9] Bostock, M. (2012). D3.js - Data-Driven Documents. Retrieved from http://d3js.org/ Search in Google Scholar

[10] Brawner, T., & Honaker, J. (2018). Bootstrap inference and differential privacy: Standard errors for free. Unpublished Manuscript. Search in Google Scholar

[11] Chance, B., Garfield, J., & delMas, R. (2000). Developing Simulation Activities To Improve Students’ Statistical Reasoning. Search in Google Scholar

[12] chroma.js. (n.d.). Retrieved from https://gka.github.io/chroma.js/ Search in Google Scholar

[13] Cumming, G., & Thomason, N. (1998). Statplay: Multimedia for statistical understanding, in Pereira-Mendoza (ed. In Proceedings of the Fifth International Conference on Teaching Statistics, ISI. Search in Google Scholar

[14] Cummings, R., Kaptchuk, G., & Redmiles, E. M. (2021). “I need a better description”: An Investigation Into User Expectations For Differential Privacy. ACM CCS.10.1145/3460120.3485252 Search in Google Scholar

[15] delMas, R. C., Garfield, J., & Chance, B. (1999). A model of classroom research in action: Developing simulation activities to improve students’ statistical reasoning. Journal of Statistics Education, 7(3). Search in Google Scholar

[16] Desfontaines, D. (2020). Lowering the cost of anonymization (Unpublished doctoral dissertation). ETH Zurich. Search in Google Scholar

[17] Du, W., Foot, C., Moniot, M., Bray, A., & Groce, A. (2020). Differentially private confidence intervals. arXiv preprint arXiv:2001.02285. Search in Google Scholar

[18] Dwork, C., Kohli, N., & Mulligan, D. (2019). Differential Privacy in Practice: Expose Your Epsilons! Journal of Privacy and Confidentiality, 9(2).10.29012/jpc.689 Search in Google Scholar

[19] Dwork, C., McSherry, F., Nissim, K., & Smith, A. (2006). Calibrating noise to sensitivity in private data analysis. In Theory of Cryptography Conference (pp. 265–284).10.1007/11681878_14 Search in Google Scholar

[20] Dwork, C., & Roth, A. (2014). The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci.. Search in Google Scholar

[21] Enabling developers and organizations to use differential privacy. (2019). Retrieved from https://developers.googleblog.com/2019/09/enabling-developers-and-organizations.html Search in Google Scholar

[22] Evans, G., King, G., Schwenzfeier, M., & Thakurta, A. (2020). Statistically valid inferences from privacy protected data. URL: GaryKing.org/dp. Search in Google Scholar

[23] Fernandes, M., Walls, L., Munson, S., Hullman, J., & Kay, M. (2018). Uncertainty displays using quantile dotplots or cdfs improve transit decision-making. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (pp. 1–12).10.1145/3173574.3173718 Search in Google Scholar

[24] Ferrando, C., Wang, S., & Sheldon, D. (2020). General-Purpose Differentially-Private Confidence Intervals. arXiv preprint arXiv:2006.07749. Search in Google Scholar

[25] Gaboardi, M., Hay, M., & Vadhan, S. (2020). A programming framework for opendp. Manuscript, May. Search in Google Scholar

[26] Gaboardi, M., Honaker, J., King, G., Murtagh, J., Nissim, K., Ullman, J., & Vadhan, S. (2018). PSI (Ψ): a Private data Sharing Interface. Search in Google Scholar

[27] Ganta, S. R., Kasiviswanathan, S. P., & Smith, A. (2008). Composition attacks and auxiliary information in data privacy. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 265–273).10.1145/1401890.1401926 Search in Google Scholar

[28] Ge, C., He, X., Ilyas, I. F., & Machanavajjhala, A. (2019). Apex: Accuracy-aware differentially private data exploration. In Proceedings of the 2019 International Conference on Management of Data (pp. 177–194).10.1145/3299869.3300092 Search in Google Scholar

[29] Gigerenzer, G., & Hoffrage, U. (1995). How to improve bayesian reasoning without instruction: frequency formats. Psychological Review, 102(4), 684.10.1037/0033-295X.102.4.684 Search in Google Scholar

[30] Greig, D. M., Porteous, B. T., & Seheult, A. H. (1989). Exact maximum a posteriori estimation for binary images. Journal of the Royal Statistical Society: Series B (Methodological), 51(2), 271–279. Search in Google Scholar

[31] Haeberlen, A., Pierce, B. C., & Narayan, A. (2011). Differential Privacy Under Fire. In USENIX Security Symposium (Vol. 33). Search in Google Scholar

[32] Hawes, M. (2020). Differential Privacy and the 2020 Decennial Census. Webinar. Search in Google Scholar

[33] Hay, M., Machanavajjhala, A., Miklau, G., Chen, Y., Zhang, D., & Bissias, G. (2016). Exploring privacy-accuracy trade-offs using dpcomp. In Proceedings of the 2016 International Conference on Management of Data (pp. 2101–2104).10.1145/2882903.2899387 Search in Google Scholar

[34] Herdağdelen, A., Dow, A., State, B., Mohassel, P., & Pompe, A. (2020). Protecting privacy in Facebook mobility data during the COVID-19 response. Retrieved from https://research.fb.com/blog/2020/06/protecting-privacy-in-facebook-mobility-data-during-the-covid-19-response/ Search in Google Scholar

[35] Hofman, J. M., Goldstein, D. G., & Hullman, J. (2020). How visualizing inferential uncertainty can mislead readers about treatment effects in scientific results. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (pp. 1–12).10.1145/3313831.3376454 Search in Google Scholar

[36] Holohan, N., Braghin, S., Mac Aonghusa, P., & Levacher, K. (2019). Diffprivlib: the IBM differential privacy library. arXiv preprint arXiv:1907.02444. Search in Google Scholar

[37] Hsu, J., Gaboardi, M., Haeberlen, A., Khanna, S., Narayan, A., Pierce, B. C., & Roth, A. (2014). Differential privacy: An economic method for choosing epsilon. In 2014 IEEE 27th Computer Security Foundations Symposium (pp. 398–410).10.1109/CSF.2014.35 Search in Google Scholar

[38] Hullman, J., Qiao, X., Correll, M., Kale, A., & Kay, M. (2018). In pursuit of error: A survey of uncertainty visualization evaluation. IEEE Transactions on Visualization and Computer Graphics, 25(1), 903–913.10.1109/TVCG.2018.286488930207956 Search in Google Scholar

[39] Hullman, J., Resnick, P., & Adar, E. (2015). Hypothetical outcome plots outperform error bars and violin plots for inferences about reliability of variable ordering. PloS One, 10(11), e0142444.10.1371/journal.pone.0142444464669826571487 Search in Google Scholar

[40] Jarvenpaa, S. L. (1990). Graphic displays in decision making—the visual salience effect. Journal of Behavioral Decision Making, 3(4), 247–262.10.1002/bdm.3960030403 Search in Google Scholar

[41] Kale, A., Kay, M., & Hullman, J. (2020). Visual reasoning strategies for effect size judgments and decisions. IEEE Transactions on Visualization and Computer Graphics. Search in Google Scholar

[42] Kale, A., Nguyen, F., Kay, M., & Hullman, J. (2018). Hypothetical outcome plots help untrained observers judge trends in ambiguous data. IEEE Transactions on Visualization and Computer Graphics, 25(1), 892–902.10.1109/TVCG.2018.286490930136961 Search in Google Scholar

[43] Karwa, V., & Vadhan, S. (2017). Finite sample differentially private confidence intervals. arXiv preprint arXiv:1711.03908. Search in Google Scholar

[44] Kasiviswanathan, S. P., & Smith, A. (2014). On the ’semantics’ of differential privacy: A bayesian formulation. Journal of Privacy and Confidentiality, 6(1).10.29012/jpc.v6i1.634 Search in Google Scholar

[45] Kay, M., Kola, T., Hullman, J. R., & Munson, S. A. (2016). When (ish) is my bus? user-centered visualizations of uncertainty in everyday, mobile predictive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (pp. 5092–5103). Search in Google Scholar

[46] Kho, A. N., Hynes, D. M. D., Goel, S., Solomonides, A. E., Price, R., Hota, B., . . . Others (2014). CAPriCORN: Chicago Area Patient-Centered Outcomes Research Network. Journal of the American Medical Informatics Association, 21(4), 607–611. Retrieved from http://jamia.oxfordjournals.org/content/21/4/607.short10.1136/amiajnl-2014-002827407829824821736 Search in Google Scholar

[47] Kifer, D., & Machanavajjhala, A. (2011). No free lunch in data privacy. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data (pp. 193–204).10.1145/1989323.1989345 Search in Google Scholar

[48] Kifer, D., & Machanavajjhala, A. (2012). A rigorous and customizable framework for privacy. In Proceedings of the 31st ACM SIGMOD-SIGACT-SIGAI symposium on Principles of Database Systems (pp. 77–88).10.1145/2213556.2213571 Search in Google Scholar

[49] Lee, J., & Clifton, C. (2011). How Much is Enough? Choosing ɛ for Differential Privacy. In International Conference on Information Security (pp. 325–340).10.1007/978-3-642-24861-0_22 Search in Google Scholar

[50] Li, C., Miklau, G., Hay, M., McGregor, A., & Rastogi, V. (2015). The matrix mechanism: optimizing linear counting queries under differential privacy. The VLDB journal, 24(6), 757–781.10.1007/s00778-015-0398-x Search in Google Scholar

[51] Liu, C., He, X., Chanyaswad, T., Wang, S., & Mittal, P. (2019). Investigating Statistical Privacy Frameworks from the Perspective of Hypothesis Testing. Proc. Priv. Enhancing Technol., 2019(3), 233–254.10.2478/popets-2019-0045 Search in Google Scholar

[52] Machanavajjhala, A., Kifer, D., Abowd, J., Gehrke, J., & Vilhuber, L. (2008). Privacy: Theory meets practice on the map. In 2008 IEEE 24th International Conference on Data Engineering (pp. 277–286).10.1109/ICDE.2008.4497436 Search in Google Scholar

[53] McKenna, R., Miklau, G., Hay, M., & Machanavajjhala, A. (2018). Optimizing error of high-dimensional statistical queries under differential privacy. Proceedings of the VLDB Endowment, 11(10), 1206–1219.10.14778/3231751.3231769 Search in Google Scholar

[54] McSherry, F. D. (2009). Privacy Integrated Queries: An Extensible Platform for Privacy-preserving Data Analysis. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of Data, series = SIGMOD ’09 (pp. 19–30). New York, NY, USA: ACM. Retrieved from http://doi.acm.org/10.1145/1559845.1559850 doi: 10.1145/1559845.155985010.1145/1559845.1559850 Search in Google Scholar

[55] Messing, S., DeGregorio, C., Hillenbrand, B., King, G., Mahanti, S., Mukerjee, Z., . . . Wilkins, A. (2020). Urls-v3.pdf. In Facebook Privacy-Protected Full URLs Data Set. Harvard Dataverse. Retrieved from https://doi.org/10.7910/DVN/TDOAPG/DGSAMS doi: 10.7910/DVN/TDOAPG/DGSAMS Search in Google Scholar

[56] Mironov, I. (2012). On significance of the least significant bits for differential privacy. In Proceedings of the 2012 ACM Conference on Computer and Communications Security (pp. 650–661).10.1145/2382196.2382264 Search in Google Scholar

[57] Mironov, I. (2017). Rényi differential privacy. In 2017 IEEE 30th Computer Security Foundations Symposium (CSF) (pp. 263–275).10.1109/CSF.2017.11 Search in Google Scholar

[58] Morgenstern, O., & Von Neumann, J. (1953). Theory of games and economic behavior. Princeton University Press. Search in Google Scholar

[59] Neyman, J., & Pearson, E. S. (2020). On the use and interpretation of certain test criteria for purposes of statistical inference. Part I. University of California Press. Search in Google Scholar

[60] Nissim, K., Raskhodnikova, S., & Smith, A. (2007). Smooth sensitivity and sampling in private data analysis. In Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (pp. 75–84).10.1145/1250790.1250803 Search in Google Scholar

[61] Rivasplata, O. (2012). Subgaussian random variables: An expository note. Internet publication, PDF. Search in Google Scholar

[62] Rogers, R., Cardoso, A. R., Mancuhan, K., Kaura, A., Gahlawat, N., Jain, N., . . . Ahammad, P. (2020). A Members First Approach to Enabling LinkedIn’s Labor Market Insights at Scale. arXiv preprint arXiv:2010.13981. Search in Google Scholar

[63] Savage, L. J. (1954). The foundations of statistics. Wiley. Search in Google Scholar

[64] Schwarz, C. J., & Sutherland, J. (1997). An on-line workshop using a simple capture-recapture experiment to illustrate the concepts of a sampling distribution. Journal of Statistics Education, 5(1).10.1080/10691898.1997.11910523 Search in Google Scholar

[65] Shepp, L. A., & Vardi, Y. (1982). Maximum likelihood reconstruction for emission tomography. IEEE Transactions on Medical Imaging, 1(2), 113–122.10.1109/TMI.1982.430755818238264 Search in Google Scholar

[66] St. John, M. F., Denker, G., Laud, P., Martiny, K., & Pankova, A. (2021). Decision Support for Sharing Data Using Differential Privacy. IEEE Transactions on Visualization and Computer Graphics, 26–35. Search in Google Scholar

[67] Sweeney, L. (2002). k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10(05), 557–570.10.1142/S0218488502001648 Search in Google Scholar

[68] Tableau Software. (n.d.). Color Palettes with RGB Values. Search in Google Scholar

[69] Tang, J., Korolova, A., Bai, X., Wang, X., & Wang, X. (2017). Privacy loss in apple’s implementation of differential privacy on macos 10.12. arXiv preprint arXiv:1709.02753. Search in Google Scholar

[70] Thaker, P., Budiu, M., Gopalan, P., Wieder, U., & Zaharia, M. (2020). Overlook: Differentially Private Exploratory Visualization for Big Data. arXiv preprint arXiv:2006.12018. Search in Google Scholar

[71] Wasserman, L., & Zhou, S. (2010). A statistical framework for differential privacy. Journal of the American Statistical Association, 105(489), 375–389.10.1198/jasa.2009.tm08651 Search in Google Scholar

[72] Wilkinson, L. (1999). Dot plots. The American Statistician, 53(3), 276–281. Search in Google Scholar

[73] Wong, R. C.-W., Fu, A. W.-C., Wang, K., & Pei, J. (2007). Minimality attack in privacy preserving data publishing. In Proceedings of the 33rd International Conference on Very Large Data Bases (pp. 543–554). Search in Google Scholar

[74] Wright, P. C., & Monk, A. F. (1991). The use of think-aloud evaluation methods in design. ACM SIGCHI Bulletin, 23(1), 55–57.10.1145/122672.122685 Search in Google Scholar

[75] Xiong, A., Wang, T., Li, N., & Jha, S. (2020). Towards Effective Differential Privacy Communication for Users’ Data Sharing Decision and Comprehension. In 2020 IEEE Symposium on Security and Privacy (SP) (pp. 392–410).10.1109/SP40000.2020.00088 Search in Google Scholar

[76] Yang, B., Sato, I., & Nakagawa, H. (2015). Bayesian differential privacy on correlated data. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data (pp. 747–762).10.1145/2723372.2747643 Search in Google Scholar

Empfohlene Artikel von Trend MD

Planen Sie Ihre Fernkonferenz mit Scienceendo