Open Access

Double-stage discretization approaches for biomarker-based bladder cancer survival modeling


Cite

1. D. Wu, C. M. Rice, and X. Wang, Cancer bioinformatics: A new approach to systems clinical medicine, 2012. Search in Google Scholar

2. S. Zheng, L. Yang, Y. Dai, L. Jiang, Y. Wei, H. Wen, and Y. Xu, Screening and survival analysis of hub genes in gastric cancer based on bioinformatics, Journal of Computational Biology, vol. 26, no. 11, pp. 1316–1325, 2019. Search in Google Scholar

3. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Identification of key biomarkers in bladder cancer: Evidence from a bioinformatics analysis, Diagnostics, vol. 10, no. 2, p. 66, 2020.10.3390/diagnostics10020066716892331991631 Search in Google Scholar

4. P. Kutwin, T. Konecki, M. Cichocki, P. Falkowski, and Z. Jabłonowski, Photodynamic diagnosis and narrow-band imaging in the management of bladder cancer: a review, Photomedicine and Laser Surgery, vol. 35, no. 9, pp. 459–464, 2017.10.1089/pho.2016.421728537820 Search in Google Scholar

5. I. Erb and C. Notredame, How should we measure proportionality on relative gene expression data?, Theory in Biosciences, vol. 135, no. 1-2, pp. 21–36, 2016.10.1007/s12064-015-0220-8487031026762323 Search in Google Scholar

6. C. A. Gallo, R. L. Cecchini, J. A. Carballido, S. Micheletto, and I. Ponzoni, Discretization of gene expression data revised, Briefings in bioinformatics, vol. 17, no. 5, pp. 758–770, 2016.10.1093/bib/bbv07426438418 Search in Google Scholar

7. P. Domingos, The role of occam’s razor in knowledge discovery, Data mining and knowledge discovery, vol. 3, no. 4, pp. 409–425, 1999.10.1023/A:1009868929893 Search in Google Scholar

8. C. Zhang, M. Berndt-Paetz, and J. Neuhaus, Bioinformatics analysis identifying key biomarkers in bladder cancer, Data, vol. 5, no. 2, p. 38, 2020.10.3390/data5020038 Search in Google Scholar

9. S. v. Buuren and K. Groothuis-Oudshoorn, mice: Multivariate imputation by chained equations in r, Journal of statistical software, pp. 1–68, 2010.10.18637/jss.v045.i03 Search in Google Scholar

10. B. V. Church, H. T. Williams, and J. C. Mar, Investigating skewness to understand gene expression heterogeneity in large patient cohorts, BMC bioinformatics, vol. 20, no. 24, pp. 1–14, 2019.10.1186/s12859-019-3252-0692388331861976 Search in Google Scholar

11. Y. Chen, S. Tu, and L. Xu, The prognostic role of genes with skewed expression distribution in lung adenocarcinoma, in International Conference on Intelligent Science and Big Data Engineering, pp. 631–640, Springer International Publishing, 2017.10.1007/978-3-319-67777-4_57 Search in Google Scholar

12. J. R. Holland, J. D. Baeder, and K. Duraisamy, Towards integrated field inversion and machine learning with embedded neural networks for rans modeling, in AIAA Scitech 2019 Forum, p. 1884, American Institute of Aeronautics and Astronautics, 2019. Search in Google Scholar

13. D. George and M. Mallery, Using SPSS for Windows step by step: a simple guide and reference. Boston, MA: Allyn & Bacon, 2003. Search in Google Scholar

14. T. Speed, Always log spot intensities and ratios, Speed Group Microarray Page, at http://www.stat.berkeley.edu/users/terry/zarray/Html/log.html, 2000. Search in Google Scholar

15. C. Cheadle, M. P. Vawter, W. J. Freed, and K. G. Becker, Analysis of microarray data using z score transformation, The Journal of molecular diagnostics, vol. 5, no. 2, pp. 73–81, 2003.10.1016/S1525-1578(10)60455-2190732212707371 Search in Google Scholar

16. R. D’Agostino and E. S. Pearson, Tests for departure from normality. empirical results for the distributions of b2 and b \sqrt b , Biometrika, vol. 60, no. 3, pp. 613–622, 1973.10.1093/biomet/60.3.613 Search in Google Scholar

17. C. Leys, C. Ley, O. Klein, P. Bernard, and L. Licata, Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median, Journal of Experimental Social Psychology, vol. 49, no. 4, pp. 764–766, 2013.10.1016/j.jesp.2013.03.013 Search in Google Scholar

18. F. E. Harrell and C. Davis, A new distribution-free quantile estimator, Biometrika, vol. 69, no. 3, pp. 635–640, 1982.10.1093/biomet/69.3.635 Search in Google Scholar

19. Z. Gu, L. Gu, R. Eils, M. Schlesner, and B. Brors, circlize implements and enhances circular visualization in r, Bioinformatics, vol. 30, no. 19, pp. 2811–2812, 2014. Search in Google Scholar

20. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, and T.-Y. Liu, Lightgbm: A highly efficient gradient boosting decision tree, in Advances in neural information processing systems (I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, eds.), vol. 30, Curran Associates, Inc., 2017. Search in Google Scholar

21. M. Beleut, R. Soeldner, M. Egorov, R. Guenther, S. Dehler, C. Morys-Wortmann, H. Moch, K. Henco, and P. Schraml, Discretization of gene expression data unmasks molecular subgroups recurring in different human cancer types, PloS one, vol. 11, no. 8, p. e0161514, 2016.10.1371/journal.pone.0161514499032727537329 Search in Google Scholar

22. S. Kotsiantis and D. Kanellopoulos, Discretization techniques: A recent survey, GESTS International Transactions on Computer Science and Engineering, vol. 32, no. 1, pp. 47–58, 2006. Search in Google Scholar

23. L. Peng, W. Qing, and G. Yujia, Study on comparison of discretization methods, in 2009 International Conference on Artificial Intelligence and Computational Intelligence, vol. 4, pp. 380–384, IEEE, 2009.10.1109/AICI.2009.385 Search in Google Scholar

24. L. A. Kurgan and K. J. Cios, Caim discretization algorithm, IEEE transactions on Knowledge and Data Engineering, vol. 16, no. 2, pp. 145–153, 2004.10.1109/TKDE.2004.1269594 Search in Google Scholar

25. C.-J. Tsai, C.-I. Lee, and W.-P. Yang, A discretization algorithm based on class-attribute contingency coefficient, Information Sciences, vol. 178, no. 3, pp. 714–731, 2008.10.1016/j.ins.2007.09.004 Search in Google Scholar

26. L. Gonzalez-Abril, F. J. Cuberos, F. Velasco, and J. A. Ortega, Ameva: An autonomous discretization algorithm, Expert Systems with Applications, vol. 36, no. 3, pp. 5327–5332, 2009. Search in Google Scholar

27. U. Fayyad and K. Irani, Multi-interval discretization of continuous-valued attributes for classification learning, in Proceedings of the 13th international joint conference on artificial intelligence, pp. 1022–1027, IJCAI, 1993. Search in Google Scholar

28. R. Kerber, Chimerge: Discretization of numeric attributes, in Proceedings of the tenth national conference on Artificial intelligence, pp. 123–128, AAAI Press, 1992. Search in Google Scholar

29. F. E. Tay and L. Shen, A modified chi2 algorithm for discretization, IEEE Transactions on knowledge and data engineering, vol. 14, no. 3, pp. 666–670, 2002.10.1109/TKDE.2002.1000349 Search in Google Scholar

30. C.-T. Su and J.-H. Hsu, An extended chi2 algorithm for discretization of real value attributes, IEEE transactions on knowledge and data engineering, vol. 17, no. 3, pp. 437–441, 2005.10.1109/TKDE.2005.39 Search in Google Scholar

31. L. Reiman, J. Friedman, R. Olshen, and C. Stone, Classification and Regression trees (Belmont, California: Wadsworth Ind. Group). Wadsworth Ind. Group, 1984. Search in Google Scholar

32. T. Chen and C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pp. 785–794, New York, NY, USA: Association for Computing Machinery, 2016. Search in Google Scholar

33. C. Ding and H. Peng, Minimum redundancy feature selection from microarray gene expression data, Journal of bioinformatics and computational biology, vol. 3, no. 02, pp. 185–205, 2005.10.1142/S021972000500100415852500 Search in Google Scholar

34. G. Figueroa, Y.-S. Chen, N. Avila, and C.-C. Chu, Improved practices in machine learning algorithms for ntl detection with imbalanced data, in 2017 IEEE Power & Energy Society General Meeting, pp. 1–5, IEEE, 2017.10.1109/PESGM.2017.8273852 Search in Google Scholar

35. A. Martino, A. Rizzi, and F. M. F. Mascioli, Supervised approaches for protein function prediction by topological data analysis, in 2018 International joint conference on neural networks (IJCNN), pp. 1–8, IEEE, 2018.10.1109/IJCNN.2018.8489307 Search in Google Scholar

36. G. Demiröz and H. A. Güvenir, Classification by voting feature intervals, in European Conference on Machine Learning, pp. 85–92, Springer, 1997.10.1007/3-540-62858-4_74 Search in Google Scholar

37. F. Ali and M. Hayat, Classification of membrane protein types using voting feature interval in combination with chou pseudo amino acid composition, Journal of theoretical biology, vol. 384, pp. 78–83, 2015.10.1016/j.jtbi.2015.07.03426297889 Search in Google Scholar

38. L. v. d. Maaten and G. Hinton, Visualizing data using t-sne, Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008. Search in Google Scholar

39. H. Je reys, An invariant form for the prior probability in estimation problems, Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, vol. 186, no. 1007, pp. 453–461, 1946. Search in Google Scholar

40. E. Purdom and S. P. Holmes, Error distribution for gene expression data, Statistical applications in genetics and molecular biology, vol. 4, no. 1, 2005.10.2202/1544-6115.107016646833 Search in Google Scholar

41. Z. Fang, R. Du, and X. Cui, Uniform approximation is more appropriate for wilcoxon rank-sum test in gene set analysis, Plos One, vol. 7, no. 2, p. e31505, 2012.10.1371/journal.pone.0031505327453622347488 Search in Google Scholar

42. M. C. Whitlock and D. Schluter, The analysis of biological data. Roberts and Company Publishers, 2009. Search in Google Scholar

43. G. Navas-Palencia, Optimal binning: mathematical programming formulation, arXiv preprint arXiv:2001.08025, 2020. Search in Google Scholar

44. R. Anderson, The credit scoring toolkit: theory and practice for retail credit risk management and decision automation. Oxford University Press, 2007. Search in Google Scholar

45. G. L. Libralon, A. C. P. de Leon Ferreira, A. C. Lorena, et al., Pre-processing for noise detection in gene expression classification data, Journal of the Brazilian Computer Society, vol. 15, no. 1, pp. 3–11, 2009.10.1007/BF03192573 Search in Google Scholar

eISSN:
2038-0909
Language:
English
Publication timeframe:
Volume Open
Journal Subjects:
Mathematics, Numerical and Computational Mathematics, Applied Mathematics