1. bookVolumen 32 (2022): Edición 2 (June 2022)
    Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Detalles de la revista
License
Formato
Revista
eISSN
2083-8492
Primera edición
05 Apr 2007
Calendario de la edición
4 veces al año
Idiomas
Inglés
access type Acceso abierto

Revisiting Strategies for Fitting Logistic Regression for Positive and Unlabeled Data

Publicado en línea: 04 Jul 2022
Volumen & Edición: Volumen 32 (2022) - Edición 2 (June 2022)<br/>Towards Self-Healing Systems through Diagnostics, Fault-Tolerance and Design (Special section, pp. 171-269), Marcin Witczak and Ralf Stetter (Eds.)
Páginas: 299 - 309
Recibido: 05 Nov 2021
Aceptado: 10 Feb 2022
Detalles de la revista
License
Formato
Revista
eISSN
2083-8492
Primera edición
05 Apr 2007
Calendario de la edición
4 veces al año
Idiomas
Inglés
Abstract

Positive unlabeled (PU) learning is an important problem motivated by the occurrence of this type of partial observability in many applications. The present paper reconsiders recent advances in parametric modeling of PU data based on empirical likelihood maximization and argues that they can be significantly improved. The proposed approach is based on the fact that the likelihood for the logistic fit and an unknown labeling frequency can be expressed as the sum of a convex and a concave function, which is explicitly given. This allows methods such as the concave-convex procedure (CCCP) or its variant, the disciplined convex-concave procedure (DCCP), to be applied. We show by analyzing real data sets that, by using the DCCP to solve the optimization problem, we obtain significant improvements in the posterior probability and the label frequency estimation over the best available competitors.

Keywords

Bahorik, A.L., Newhill, C.E., Queen, C.C. and Eack, S.M. (2014). Under-reporting of drug use among individuals with schizophrenia: Prevalence and predictors, Psychological Medicine 44(12): 61–69, DOI: 10.1017/S0033291713000548.23551851 Abierto DOISearch in Google Scholar

Bekker, J. and Davis, J. (2018). Estimating the class prior in positive and unlabeled data through decision tree induction, Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, USA 32(1): 2712–2719.10.1609/aaai.v32i1.11715 Search in Google Scholar

Bekker, J. and Davis, J. (2020). Learning from positive and unlabeled data: A survey, Machine Learning 109(4): 719–760, DOI: 10.1007/s10994-020-05877-5. Abierto DOISearch in Google Scholar

Bekker, J., Robberechts, P. and Davis, J. (2019). Beyond the selected completely at random assumption for learning from positive and unlabeled data, in U. Brefeld et al. (Eds), Proceedings of the 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Springer, Cham, pp. 71–85, DOI: 10.1007/978-3-030-46147-8_5. Abierto DOISearch in Google Scholar

Cover, T. and Thomas, J. (1991). Elements of Information Theory, Wiley, New York, DOI: 10.1002/047174882X. Abierto DOISearch in Google Scholar

Elkan, C. and Noto, K. (2008). Learning classifiers from only positive and unlabeled data, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, USA, pp. 213–220, DOI: 10.1145/1401890.1401920. Abierto DOISearch in Google Scholar

Łazęcka, M., Mielniczuk, J. and Teisseyre, P. (2021). Estimating the class prior for positive and unlabelled data via logistic regression, Advances in Data Analysis and Classification 15(4): 1039–1068, DOI: 10.1007/s11634-021-00444-9. Abierto DOISearch in Google Scholar

Lipp, T. and Boyd, S. (2016). Variations and extension of the convex-concave procedure, Optimization and Engineering 17(2): 263–287, DOI: 10.1007/s11081-015-9294-x. Abierto DOISearch in Google Scholar

Liu, B., Dai, Y., Li, X., Lee, W.S. and Yu, P.S. (2003). Building text classifiers using positive and unlabeled examples, Proceedings of the 3rd IEEE International Conference on Data Mining, ICDM’03, Melbourne, USA, pp. 179–186, DOI: 10.1109/ICDM.2003.1250918. Abierto DOISearch in Google Scholar

Na, B., Kim, H., Song, K., Joo, W., Kim, Y.-Y. and Moon, I.-C. (2020). Deep generative positive-unlabeled learning under selection bias, Proceedings of the 29th ACM International Conference on Information and Knowledge Management, CIKM’20, Ireland, pp. 1155–1164, DOI: 10.1145/3340531.3411971, (virtual event). Abierto DOISearch in Google Scholar

Scott, B., Blanchard, G. and Handy, G. (2013). Classification with asymetric label noise: Consistency and maximal denoising, Proceedings of Machine Learning Research 30(2013): 1–23. Search in Google Scholar

Sechidis, K., Sperrin, M., Petherick, E.S., Luján, M. and Brown, G. (2017). Dealing with under-reported variables: An information theoretic solution, International Journal of Approximate Reasoning 85(1): 159–177, DOI: 10.1016/j.ijar.2017.04.002. Abierto DOISearch in Google Scholar

Shen, X., Diamond, S., Gu, Y. and Boyd, S. (2016). Disciplined convex-concave programming, Proceedings of 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, USA, pp. 1009–1014, DOI: 10.1109/CDC.2016.7798400. Abierto DOISearch in Google Scholar

Teisseyre, P., Mielniczuk, J. and Łazęcka, M. (2020). Different strategies of fitting logistic regression for positive and unlabelled data, in V.V. Krzhizhanovskaya et al. (Eds), Proceedings of the International Conference on Computational Science ICCS’20, Springer International Publishing, Cham, pp. 3–17, DOI: 10.1007/978-3-030-50423-6_1. Abierto DOISearch in Google Scholar

Ward, G., Hastie, T., Barry, S., Elith, J. and Leathwick, J. (2009). Presence-only data and the EM algorithm, Biometrics 65(2): 554–563, DOI: 10.1111/j.1541-0420.2008.01116.x.482188618759851 Abierto DOISearch in Google Scholar

Yang, P., Li, X., Chua, H., Kwoh, C. and Ng, S. (2014). Ensemble positive unlabeled learning for disease gene identification, PLOS ONE 9(5): 1–11, DOI: 10.1371/journal.pone.0097079.401624124816822 Abierto DOISearch in Google Scholar

Yuille, A. and Rangarajan, A. (2003). The concave-convex procedure, Neural Computation 15(4): 915–936, DOI: 10.1162/08997660360581958.12689392 Abierto DOISearch in Google Scholar

Artículos recomendados de Trend MD

Planifique su conferencia remota con Sciendo