Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

[1] T. Menzies, B. Turhan, A. Bener, G. Gay, B. Cukic, and Y. Jiang, “Implications of ceiling effects in defect predictors,” in Proceedings of the 4th international workshop on Predictor models in software engineering - PROMISE ’08, 2008, pp. 47–54. https://doi.org/10.1145/1370788.1370801 Search in Google Scholar

[2] M. Nevendra and P. Singh, “Software defect prediction using deep learning,” Acta Polytech. Hungarica, vol. 18, no. 10, pp. 173–189, 2021. https://doi.org/10.12700/aph.18.10.2021.10.9 Search in Google Scholar

[3] M. Shepperd, D. Bowes, and T. Hall, “Researcher Bias: The use of machine learning in software defect prediction,” IEEE Trans. Softw. Eng., vol. 40, no. 6, pp. 603–616, Jun. 2014. https://doi.org/10.1109/TSE.2014.2322358 Search in Google Scholar

[4] T. Wang, W. Li, H. Shi, and Z. Liu, “Software defect prediction based on classifiers ensemble,” J. Inf. C. Sci., vol. 8, no. 16, pp. 4241–4254, 2011. Search in Google Scholar

[5] J. Nam, S. J. Pan, and S. Kim, “Transfer defect learning,” in 2013 35th International Conference on Software Engineering (ICSE), San Francisco, CA, USA, May 2013, pp. 382–391. https://doi.org/10.1109/ICSE.2013.6606584 Search in Google Scholar

[6] X. Xia, D. Lo, S. J. Pan, N. Nagappan, and X. Wang, “HYDRA: Massively compositional model for cross-project defect prediction,” IEEE Trans. Softw. Eng., vol. 42, no. 10, pp. 977–998, Oct. 2016. https://doi.org/10.1109/TSE.2016.2543218 Search in Google Scholar

[7] Z. He, F. Shu, Y. Yang, M. Li, and Q. Wang, “An investigation on the feasibility of cross-project defect prediction,” Autom. Softw. Eng., vol. 19, no. 2, pp. 167–199, Jul. 2012.https://doi.org/10.1007/s10515-011-0090-3 Search in Google Scholar

[8] H. Chen, X. Jing, Z. Li, D. Wu, Y. Peng, and Z. Huang, “An empirical study on heterogeneous defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 47, no. 12, pp. 2803–2822, Jan. 2020. https://doi.org/10.1109/TSE.2020.2968520 Search in Google Scholar

[9] H. Turabieh, M. Mafarja, and X. Li, “Iterated feature selection algorithms with layered recurrent neural network for software fault prediction,” Expert Syst. Appl., vol. 122, pp. 27–42, May 2019. https://doi.org/10.1016/j.eswa.2018.12.033 Search in Google Scholar

[10] T. Hall, S. Beecham, D. Bowes, D. Gray, and S. Counsell, “A systematic literature review on fault prediction performance in software engineering,” IEEE Trans. Softw. Eng., vol. 38, no. 6, pp. 1276–1304, Oct. 2012. https://doi.org/10.1109/TSE.2011.103 Search in Google Scholar

[11] S. Hosseini, B. Turhan, and D. Gunarathna, “A systematic literature review and meta-analysis on cross project defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 2, pp. 111–147, Nov. 2019. https://doi.org/10.1109/TSE.2017.2770124 Search in Google Scholar

[12] F.J. Provost, “Machine learning from imbalanced data sets 101,” inProc. AAAI’2000 Workshop onImbalanced Data Sets, 2000, pp. 1–3. Search in Google Scholar

[13] D. Ryu, J. I. Jang, and J. Baik, “A transfer cost-sensitive boosting approach for cross-project defect prediction,” Softw. Qual. J., vol. 25, no. 1, pp. 235–272, Sep. 2017. https://doi.org/10.1007/s11219-015-9287-1 Search in Google Scholar

[14] S. Wang and X. Yao, “Using class imbalance learning for software defect prediction,” IEEE Trans. Reliab., vol. 62, no. 2, pp. 434–443, Apr. 2013. https://doi.org/10.1109/TR.2013.2259203 Search in Google Scholar

[15] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “SMOTE: Synthetic minority over-sampling technique,” J. Artif. Intell. Res., vol. 16, pp. 321–357, Jun. 2002. https://doi.org/10.1613/jair.953 Search in Google Scholar

[16] H. He, Y. Bai, E. A. Garcia, and S. Li, “ADASYN: Adaptive synthetic sampling approach for imbalanced learning,” inProc. Int. Jt. Conf. Neural Networks, Hong Kong, Jun. 2008, pp. 1322–1328. https://doi.org/10.1109/IJCNN.2008.4633969 Search in Google Scholar

[17] A. Agrawal and T. Menzies, “Is ‘better data’ better than ‘better data miners’?: on the benefits of tuning SMOTE for defect prediction,” in2018 IEEE/ACM 40th Int. Conf. Softw. Eng., May 2018, pp. 1050–1061. https://doi.org/10.1145/3180155.3180197 Search in Google Scholar

[18] C. Tantithamthavorn and A. E. Hassan, “The impact of class rebalancing techniques on the performance and interpretation of defect prediction models,” IEEE Trans. Softw. Eng., 2018. [Online]/Available: chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://chakkrit.com/assets/papers/tantithamthavorn2018imbalance.pdf Search in Google Scholar

[19] S. S. Rathore and S. Kumar, “Linear and non-linear heterogeneous ensemble methods to predict the number of faults in software systems,” Knowledge-Based Syst., vol. 119, pp. 232–256, Mar. 2017. https://doi.org/10.1016/j.knosys.2016.12.017 Search in Google Scholar

[20] S. S. Rathore and S. Kumar, “An approach for the prediction of number of software faults based on the dynamic selection of learning techniques,” IEEE Trans. Reliab., vol. 68, no. 1, pp. 216–236, Aug. 2018. https://doi.org/10.1109/TR.2018.2864206 Search in Google Scholar

[21] T. G. Dietterich, “Ensemble methods in machine learning,” in Multiple Classifier Systems, MCS 2000. Lecture Notes in Computer Science, vol 1857. Springer, Berlin, Heidelberg, Jan. 2000, pp. 1–15. https://doi.org/10.1007/3-540-45014-9_1 Search in Google Scholar

[22] Q. Song, Z. Jia, M. Shepperd, S. Ying, and J. Liu, “A general software defect-proneness prediction framework,” IEEE Trans. Softw. Eng., vol. 37, no. 3, pp. 356–370, 2011. https://doi.org/10.1109/TSE.2010.90 Search in Google Scholar

[23] X. Rong, F. Li, and Z. Cui, “A model for software defect prediction using support vector machine based on CBA,” Int. J. Intell. Syst. Technol. Appl., vol. 15, no. 1, pp. 19–34, Apr. 2016. https://doi.org/10.1504/IJISTA.2016.076102 Search in Google Scholar

[24] K. Dejaeger, T. Verbraken, and B. Baesens, “Toward comprehensible software fault prediction models using bayesian network classifiers,” IEEE Trans. Softw. Eng., vol. 39, no. 2, pp. 237–257, Apr. 2013. https://doi.org/10.1109/TSE.2012.20 Search in Google Scholar

[25] T. Zimmermann, N. Nagappan, H. Gall, E. Giger, and B. Murphy, “Cross-project defect prediction: a large scale experiment on data vs. domain vs. process,” in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on the Foundations of Software Engineering,Aug. 2009, pp. 91–100. https://doi.org/10.1145/1595696.1595713 Search in Google Scholar

[26] B. Turhan, T. Menzies, A. B. Bener, and J. Di Stefano, “On the relative value of cross-company and within-company data for defect prediction,” Empir. Softw. Eng., vol. 14, pp. 540–578, Jan. 2009. https://doi.org/10.1007/s10664-008-9103-7 Search in Google Scholar

[27] B. Turhan, A. T. Misirli, and A. Bener, “Empirical evaluation of the effects of mixed project data on learning defect predictors,” Inf. Softw. Technol., vol. 55, no. 6, pp. 1101–1118, Jun. 2013. https://doi.org/10.1016/j.infsof.2012.10.003 Search in Google Scholar

[28] P. He, Y. Ma, and B. Li, “TDSelector: a training data selection method for cross-project defect prediction,” 2016.[Online]. Available: https://arxiv.org/ftp/arxiv/papers/1612/1612.09065.pdf Search in Google Scholar

[29] S. Herbold, A. Trautsch, and J. Grabowski, “A comparative study to benchmark cross-project defect prediction approaches,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 811–833, Sep. 2018. https://doi.org/10.1109/TSE.2017.2724538 Search in Google Scholar

[30] J. Nam, W. Fu, S. Kim, T. Menzies, and L. Tan, “Heterogeneous defect prediction,” IEEE Trans. Softw. Eng., vol. 44, no. 9, pp. 874–896, Sep. 2018. https://doi.org/10.1109/TSE.2017.2720603 Search in Google Scholar

[31] P. He, B. Li, X. Liu, J. Chen, and Y. Ma, “An empirical study on software defect prediction with a simplified metric set,” Inf. Softw. Technol., vol. 59, pp. 170–190, Mar. 2015. https://doi.org/10.1016/j.infsof.2014.11.006 Search in Google Scholar

[32] Y. Zhang, D. Lo, X. Xia, and J. Sun, “An empirical study of classifier combination for cross-project defect prediction,” in 2015 IEEE 39th Annual Computer Software and Applications Conference, vol. 2, Taichung, Taiwan, Jul. 2015, pp. 264–269. https://doi.org/10.1109/COMPSAC.2015.58 Search in Google Scholar

[33] C. Ni, W. S. Liu, X. Chen, Q. Gu, D. X. Chen, and Q. G. Huang, “A cluster based feature selection method for cross-project software defect prediction,” J. Comput. Sci. Technol., vol. 32, no. 6, pp. 1090–1107, Dec. 2017. https://doi.org/10.1007/s11390-017-1785-0 Search in Google Scholar

[34] B. Turhan, “On the dataset shift problem in software engineering prediction models,” Empir. Softw. Eng., vol. 17, no. 1–2, pp. 62–74, Oct. 2012. https://doi.org/10.1007/s10664-011-9182-8 Search in Google Scholar

[35] Q. Song, Y. Guo, and M. Shepperd, “A comprehensive investigation of the role of imbalanced learning for software defect prediction,” IEEE Trans. Softw. Eng., vol. 45, no. 12, Dec. 2018. https://doi.org/10.1109/TSE.2018.2836442 Search in Google Scholar

[36] T. Fukushima, Y. Kamei, S. McIntosh, K. Yamashita, and N. Ubayashi, “An empirical study of just-in-time defect prediction using cross-project models,” in Proceedings of the 11th Working Conference on Mining Software Repositories –MSR 2014, May 2014, pp. 172–181. https://doi.org/10.1145/2597073.2597075 Search in Google Scholar

[37] X.-Y. Jing, F. Wu, X. Dong, and B. Xu, “An improved SDA based defect prediction framework for both within-project and cross-project class-imbalance problems,” IEEE Trans. Softw. Eng., vol. 43, no. 4, pp. 321–339, Apr. 2017. https://doi.org/10.1109/TSE.2016.2597849 Search in Google Scholar

[38] A. K. Shukla, P. Singh, and M. Vardhan, “A hybrid gene selection method for microarray recognition,” Biocybern. Biomed. Eng., vol. 38, no. 4, pp. 975–991, 2018. https://doi.org/10.1016/j.bbe.2018.08.004 Search in Google Scholar

[39] C. Ding and H. Peng, “Minimum redundancy feature selection from microarray gene expression data,” inBioinforma. Conf. 2003.CSB 2003. Proceedings of the 2003 IEEE Bioinformatics Conference, Stanford, CA, USA, Aug. 2003, pp. 523–528. https://doi.org/10.1109/CSB.2003.1227396 Search in Google Scholar

[40] N. Settouti, M. E. A. Bechar, and M. A. Chikh, “Statistical comparisons of the top 10 algorithms in data mining for classification task,” Int. J. Interact. Multimed. Artif. Intell., vol. 4, no. 1, pp. 46–51, Sep. 2016. https://doi.org/10.9781/ijimai.2016.419 Search in Google Scholar

[41] R. Malhotra, “A systematic review of machine learning techniques for software fault prediction,” Appl. Soft Comput. J., vol. 27, pp. 504–518, Feb. 2015. https://doi.org/10.1016/j.asoc.2014.11.023 Search in Google Scholar

[42] Z. Zhang and X. Xie, “Research on AdaBoost.M1 with Random Forest,” in ICCET 2010–2010 Int. Conf. Comput. Eng. Technol. Proc., vol. 1, Chengdu, Apr. 2010, pp. V1-647–V1-652. https://doi.org/10.1109/ICCET.2010.5485910 Search in Google Scholar

[43] C. Tantithamthavorn, S. McIntosh, A. E. Hassan, and K. Matsumoto, “The impact of automated parameter optimization on defect prediction models,” IEEE Trans. Softw. Eng., vol. 45, no. 7, pp. 683–711, Jul. 2019. https://doi.org/10.1109/TSE.2018.2794977 Search in Google Scholar

[44] M. Jureczko and L. Madeyski, “Towards identifying software project clusters with regard to defect prediction,” inProc. 6th Int. Conf. Predict. Model. Softw. Eng. –PROMISE’10, Art. no.9, Sep. 2010, pp. 1–10. https://doi.org/10.1145/1868328.1868342 Search in Google Scholar

[45] G. Boetticher, T. Menzies, and T. Ostrand, “Tera-PROMISE: Welcome to one of the largest repositories of SE research data.” [Online]. Available: http://openscience.us/repo/.Accessed on: Nov. 30, 2017. Search in Google Scholar

[46] Z. He, F. Peters, T. Menzies, and Y. Yang, “Learning from open-source projects: An empirical study on defect prediction,” inInt. Symp. Empir. Softw. Eng. Meas., Baltimore, MD, USA, Oct. 2013, pp. 45–54. https://doi.org/10.1109/ESEM.2013.20 Search in Google Scholar

[47] Y. Liu, T. M. Khoshgoftaar, and N. Seliya, “Evolutionary optimization of software quality modeling with multiple repositories,” IEEE Trans. Softw. Eng., vol. 36, no. 6, pp. 852–864, May 2010. https://doi.org/10.1109/TSE.2010.51 Search in Google Scholar

[48] G. Canfora, A. De Lucia, M. Di Penta, R. Oliveto, A. Panichella, and S. Panichella, “Multi -objective cross-project defect prediction,” in Proceedings –IEEE 6th International Conference on Software Testing, Verification and Validation, ICST 2013, Luxembourg, Mar. 2013, pp. 252–261. https://doi.org/10.1109/ICST.2013.38 Search in Google Scholar

[49] A. Panichella, R. Oliveto, and A. De Lucia, “Cross-project defect prediction models: L’Union fait la force,” in 2014 Software Evolution Week –IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE), Antwerpen, Belgium, Feb. 2014, pp. 164–173. https://doi.org/10.1109/CSMR-WCRE.2014.6747166 Search in Google Scholar

[50] J. Demšar, “Statistical comparisons of classifiers over multiple data sets,” J. Mach. Learn. Res., vol. 7, no. 4, pp. 1–30, Aug. 2006. https://doi.org/10.1016/j.jecp.2010.03.00520451214 Search in Google Scholar

[51] E. G. Jelihovschi, J. C. Faria, and I. B. Allaman, “The ScottKnott clustering algorithm,” Univ. Estadual St. Cruz-UESC, Ilheus, Bahia, Bras., 2014. Search in Google Scholar

[52] Q. Yu, J. Qian, S. Jiang, Z. Wu, and G. Zhang, “An empirical study on the effectiveness of feature selection for cross-project defect prediction,” IEEE Access, vol. 7, pp. 35710–35718, Jan. 2019. https://doi.org/10.1109/ACCESS.2019.2895614 Search in Google Scholar

[53] M.Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. Am. Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522 Search in Google Scholar

eISSN:: 2255-8691
Language:: English

Publication timeframe:: 2 times per year
Journal Subjects:: Computer Sciences, Information Technology, Project Management, Software Development, Artificial Intelligence

Journal RSS Feed

Cross-Project Defect Prediction with Metrics Selection and Balancing Approach

Published Online: Jan 24, 2023

Page range: 137 - 148

DOI: https://doi.org/10.2478/acss-2022-0015

KeywordsAdaBoost, ensemble, Random Forest, SMOTE

© 2022 Meetesh Nevendra et al., published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Keywords
AdaBoost, ensemble, Random Forest, SMOTE