1. bookVolume 17 (2014): Issue 1 (July 2014)
Journal Details
License
Format
Journal
eISSN
1027-5207
First Published
11 Dec 2014
Publication timeframe
2 times per year
Languages
English
access type Open Access

Predicting Dropout Student: An Application of Data Mining Methods in an Online Education Program

Published Online: 11 Dec 2014
Volume & Issue: Volume 17 (2014) - Issue 1 (July 2014)
Page range: 118 - 133
Journal Details
License
Format
Journal
eISSN
1027-5207
First Published
11 Dec 2014
Publication timeframe
2 times per year
Languages
English
Abstract

This study examined the prediction of dropouts through data mining approaches in an online program. The subject of the study was selected from a total of 189 students who registered to the online Information Technologies Certificate Program in 2007-2009. The data was collected through online questionnaires (Demographic Survey, Online Technologies Self-Efficacy Scale, Readiness for Online Learning Questionnaire, Locus of Control Scale, and Prior Knowledge Questionnaire). The collected data included 10 variables, which were gender, age, educational level, previous online experience, occupation, self efficacy, readiness, prior knowledge, locus of control, and the dropout status as the class label (dropout/not). In order to classify dropout students, four data mining approaches were applied based on k-Nearest Neighbour (k-NN), Decision Tree (DT), Naive Bayes (NB) and Neural Network (NN). These methods were trained and tested using 10-fold cross validation. The detection sensitivities of 3-NN, DT, NN and NB classifiers were 87%, 79.7%, 76.8% and 73.9% respectively. Also, using Genetic Algorithm (GA) based feature selection method, online technologies self-efficacy, online learning readiness, and previous online experience were found as the most important factors in predicting the dropouts.

Keywords

1. Allen, I.E. and Seaman, J. (2007). Online nation: Five years of growth in online learning. Needham, MA: Sloan Consortium.Search in Google Scholar

2. Baker, R.S.J.D. (2010). Data Mining for Education. In B. McGaw, P. Peterson, E. Baker (eds.), International Encyclopaedia of Education (3rd edition), (pp. 112-118). Oxford, UK: ElsevierSearch in Google Scholar

3. Baker, R. and Siemens, G. (in press). Educational data mining and learning analytics. To appear in Sawyer, K. (ed.), Cambridge Handbook of the Learning Sciences: 2nd Edition.Search in Google Scholar

4. Beck, J. and Woolf, B.P. (2000). High-level student modeling with machine learning. In G. Gauthier, C. Frasson & K. VanLehn (eds.), Proceedings of Fifth International Conference on Intelligent Tutoring Systems, (pp. 584-593). Berlin: Springer-Verlag Berlin & Heidelberg GmbH & Co. K.Search in Google Scholar

5. Beikzadeh, M.R.; Phon-Amnuaisuk, S. and Delavari, N. (2008). Data mining application in higher learning institutions. In International Journal of Informatics in Education, 7(1), (pp. 31-54).Search in Google Scholar

6. Benoît, G. (2002). Data mining. In Annual Review of Information Science and Technology, 36, (pp. 265-310).10.1002/aris.1440360107Search in Google Scholar

7. Berge, Z. and Huang, Y. (2004). A Model for Sustainable Student Retention: A Holistic Perspective on the Student Dropout Problem with Special Attention to e-Learning. In DEOSNEWS, 13(5), Retrieved July 29,2011, http://www.ed.psu.edu/acsde/deos/deosnews/deosnews13_5.pdfSearch in Google Scholar

8. Berson, A.; Smith, S. and Thearling, K. (2000). Building Data Mining Applications for CRM. New York: McGraw-Hill Professional Publishing.Search in Google Scholar

9. Black, E.W.; Dawson, K. and Priem, J. (2008). Data for free: using LMS activity logs to measure community in online courses. In The Internet and Higher Education, 11(2), (pp. 65-70).10.1016/j.iheduc.2008.03.002Search in Google Scholar

10. Carr, S. (2000). As distance education comes of age, the challenge is keeping the students. In The Chronicle of Higher Education, 46(23), (pp. A39-A41).Search in Google Scholar

11. Chaudhuri, S. (1998). Data Mining and Database Systems: Where is the Intersection? In IEEE Bulletin of the Technical Committee on Data Engineering, 21(1), (pp. 4-8).Search in Google Scholar

12. Chen, G.; Liu, C.; Ou, K. and Liu, B. (2000). Discovering decision knowledge from web log portfolio for managing classroom processes by applying decision tree and data cube technology. In Journal of Educational Computing Research, 23(3), (pp. 305-332).10.2190/5JNM-B6HP-YC58-PM5YSearch in Google Scholar

13. Cortez, P. and Silva, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito & J. Teixeira (eds.), EUROSIS, (pp.5-12).Search in Google Scholar

14. Davis, L. (1991). Handbook of Genetic Algorithms. New York, NY: Van Nostrand ReinholdSearch in Google Scholar

15. Dag, I. (1991). The reliability and validity study of Rotter’s IE/LOC scale for university students. In Turkish Journal of Psychiatry, 7(26), (pp. 10-16).Search in Google Scholar

16. Dekker, G.W.; Pechenizkiy, M. and Vleeshouwers, J.M. (2009). Predicting student drop out: A case study. In T. Barnes, M. Desmarais, C. Romero & S. Ventura (eds.), Proceedings of the 2nd International Conference on Educational Data Mining, EDM 2009, Retrieved July 29, 2011, from http://www.educationaldatamining.org/EDM2009/uploads/proceedings/dekker.pdfSearch in Google Scholar

17. Domingos, P. and Pazzani, M. (1997). On the optimality of the simple Bayesian classifier under zero-one loss. In Machine Learning, 29, (pp. 103-130).10.1023/A:1007413511361Search in Google Scholar

18. Durfee, A.; Schneberger, S. and Amoroso, D.L. (2007). Evaluating students’ computer-based learning using a visual data mining approach. In Journal of Informatics Education Research, 9(1), (pp. 1-28).Search in Google Scholar

19. Fayyad, U.M.; Pitatesky-Shapiro, G.; Smyth, P. and Uthurasamy, R. (1996). Advances in Knowledge Discovery and Data Mining. AAAI/MIT Press, Cambridge.Search in Google Scholar

20. Flach, P. (2003). The geometry of ROC space: understanding machine learning metrics through ROC isometrics. In T. Fawcett & N. Mishra (eds.), Proceedings 20th International Conference on Machine Learning (ICML’03), (pp. 194-201). AAAI Press.Search in Google Scholar

21. Flach, P. and Lachiche, N. (2004). Naive Bayesian classification of structured data. In Machine Learning, 57(3), (pp. 233-269).10.1023/B:MACH.0000039778.69032.abSearch in Google Scholar

22. Gibbs, M.R. (2003). Knowledge Sharing and Socialization in Distributed Communities of Practice. In R.M. Verburg & J.A. De Ridder (eds.), Knowledge Sharing Under Distributed Circumstances, Amsterdam: Netherlands Organization for Scientific Research.Search in Google Scholar

23. Hämäläinen, W. and Vinni, M. (2010). Classifiers for educational technology. In C. Romero, S. Ventura, M. Pechenizkiy, R.S.J.d. Baker (eds.), Handbook of Educational Data Mining, (pp. 54-74). CRC Press.Search in Google Scholar

24. Hämäläinen, W.; Suhonen, J.; Sutinen, E. and Toivonen, H. (2004) Data mining in personalizing distance education courses. In World conference on open learning and distance education. Retrieved July 29, 2011, from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.79.5378&rep=rep1&type=pdfSearch in Google Scholar

25. Han, J. and Kamber, M. (2006). Data Mining Concepts and Techniques. The Morgan Kaufmann Series in Data Management Systems, 2nd Edition. San Francisco: Elsevier Inc.Search in Google Scholar

26. Hand, D.; Mannila, H. and Smyth, P. (2002). Principles of data mining. Cambridge, Massachussetts, USA: MIT Press.Search in Google Scholar

27. Herzog, S. (2006). Estimating student retention and degree-completion time: Decision trees and neural networks vis-à-vis regression. In New Directions for Institutional Research, 2006(131), (pp. 17-33).10.1002/ir.185Search in Google Scholar

28. Hung, J. and Zhang, K. (2008). Revealing online learning behaviors and activity patterns and making predictions with data mining techniques in online teaching. In MERLOT Journal of Online Learning and Teaching, 4(4), (pp. 426-437).Search in Google Scholar

29. Inan, F.A., Yukselturk, E. and Grant, M.M. (2009). Profiling potential dropout students by individual characteristics in an online certificate program. In International Journal of Instructional Media, 36(2), (pp. 163-176).Search in Google Scholar

30. Isler, V. (1998). Distance Education Experiences of the Middle East Technical University. Paper presented at MEDISAT-EUREKA: Joint Workshop: Internet as a Medium for Innovation and Technology Development in Eastern Mediterranean, Tubitak-Bilten & EU/INCO-DC, Ankara, Turkey.Search in Google Scholar

31. Kotsiantis, S.B. (2007). Supervised Machine Learning: A Review of Classification Techniques. In Informatica, 31(3), (pp. 249-268).Search in Google Scholar

32. Kotsiantis, S.; Pierrakeas, C. and Pintelas, P. (2003). Preventing student dropout in distance learning systems using machine learning techniques. In Knowledge-Based Intelligent Information and Engineering Systems, (pp. 267-274).10.1007/978-3-540-45226-3_37Search in Google Scholar

33. Lile A. (2011). Analyzing E-Learning Systems Using Educational Data Mining Techniques. In Mediterranean Journal of Social Sciences, 2(3), (pp. 403-419). DOI: 10.5901/mjss.2011.v2n3p403 10.5901/mjss.2011.v2n3p403Search in Google Scholar

34. Lykourentzou, I.; Giannoukos, I.; Nikolopoulos, V.; Mpardis, G. and Loumos, V. (2009). Dropout prediction in e-learning courses through the combination of machine learning techniques. In Computers & Education, 53(3), (pp. 950-965).10.1016/j.compedu.2009.05.010Search in Google Scholar

35. McCarthy, J.S. and Earp, M.S. (2009). Who makes mistakes? Using data mining techniques to analyze reporting errors in total acres operated. National Agricultural Statistics Service, RDD Research Report Number RDD-09-02. Retrieved, January 21, 2012 from http://www.nass.usda.gov/Education_and_Outreach/Reports,_Presentations_and_Conferences/reports/data-mining-reporting-errors.pdfSearch in Google Scholar

36. Mcvay, M. (2000). Developing a Web-based distance student orientation to enhance student success in an online Bachelor’s degree completion program. Unpublished practicum report presented to the Ed.D. Program, Nova Southeastern University, Florida.Search in Google Scholar

37. Mitchell, T. (1997). Machine Learning. New York: McGraw Hill.Search in Google Scholar

38. Miltiadou, M. and Yu, C.H. (2000). Validation of the online technologies self-efficacy survey (OTSES). (ERIC Document Reproduction Service No. ED. 445672).Search in Google Scholar

39. Minaei-Bidgoli, B.; Kashy, D.; Kortemeyer, G. and Punch W. (2003). Predicting student performance: An application of data mining methods with an educational web-based system. In Proceeding of IEEE Frontiers in Education, (pp. 13-18). Colorado, USA.10.1109/FIE.2003.1263284Search in Google Scholar

40. Minaei-Bidgoli, B.; Kortemeyer, G. and Punch, W.F. (2004). Enhancing online learning performance: An application of data mining methods. Paper presented at the 7th IASTED International Conference on Computers and Advanced Technology in Education (CATE 2004), Retrieved July 29, 2011, from http://www.loncapa. org/papers/Behrouz_CATE2004.pdfSearch in Google Scholar

41. Quinlan, J.R. (1993). C4.5: Programs for machine learning. San Francisco, CA.: Morgan Kaufmann Publishers.Search in Google Scholar

42. Romero, C. and Ventura, S. (2007). Educational Data Mining: A Survey from 1995 to 2005. In Expert Systems with Applications, 33(1), (pp. 135-146).10.1016/j.eswa.2006.04.005Search in Google Scholar

43. Romero, C.; Ventura, S.; Castro, C.; Hall, W. and Ng, M.H. (2002). Using genetic algorithms for data mining in web-based educational hypermedia systems. In Proceedings of AH2002 workshop Adaptive Systems for Web-based Education, Malaga, Spain.Search in Google Scholar

44. Romero, C.; Ventura, S. and García, E. (2008). Data mining in course management systems: Moodle case study and tutorial. In Computers & Education, 51(1), (pp. 368-384).10.1016/j.compedu.2007.05.016Search in Google Scholar

45. Romero, C.; Ventura, S.; Espejo, P.G.; Hervas, C. (2008) Data Mining Algorithms to Classify Students. In Proceedings of the First International Conference on Educational Data Mining, (pp. 8-17).Search in Google Scholar

46. Rotter, J.B. (1966). Generalized expectancies for internal versus external control of reinforcement. In Psychological Monographs: General and Applied, 80(1), (pp. 1-26).Search in Google Scholar

47. Schouten, B. and de Nooij, G. (2005). Nonresponse adjustment using classification trees. Discussion Paper 05001, Voorburg/Heerlen: Statistics Netherlands.Search in Google Scholar

48. Scime, A. and Murray, G.R. (2007). Vote prediction by iterative domain knowledge and attribute elimination. In International Journal of Business Intelligence and Data Mining, 2(2), (pp. 160-176).10.1504/IJBIDM.2007.013935Search in Google Scholar

49. Simpson, O. (2004).The impact on retention of interventions to support distance learning students. In Open Learning, 19(1), (pp. 79-96). 10.1080/0268051042000177863Search in Google Scholar

50. Su, J.-M.; Tseng, S.-S.; Wang, W.; Weng, J.-F.; Yang, J.T.D. and Tsai, W.-N. (2006). Learning Portfolio Analysis and Mining for SCORM Compliant Environment. In Educational Technology & Society, 9(1), (pp. 262-275).Search in Google Scholar

51. Superby, J.F. ; Vandamme, J.P. and Meskens, N. (2006). Determination of factors influencing the achievement of the first-year university students using data mining methods. In Proceedings of the workshop on educational data mining, ITS’06, (pp. 37-44).Search in Google Scholar

52. Talavera, L. and Gaudioso, E. (2004). Mining student data to characterize similar behavior groups in unstructured collaboration spaces. Paper presented at Workshop on Artificial Intelligence in Computer Supported Collaborative Learning at European Conference on Artificial Intelligence. Retrieved July 29, 2011, from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.76.4034&rep=rep1&type=pdfSearch in Google Scholar

53. Vuk, M. and Curk, T. (2006). ROC curve, lift chart and calibration plot. In Metodoloˇski zvezki, 3(1), (pp. 89-108).10.51936/noqf3710Search in Google Scholar

54. Wang, W.; Weng, J.; Su, J. and Tseng, S. (2004). Learning portfolio analysis and mining in SCORM compliant environment. Paper presented at the 34th ASEE/IEEE Frontiers in Education Conference, Savannah, GA. Retrieved July 29, 2011, from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.96.5833&rep=rep1&type=pdf10.1109/FIE.2004.1408490Search in Google Scholar

55. Willging, P.A. and Johnson, S.D. (2004). Factors that influence students’ decision to dropout of online courses. In Journal of Asynchronous Learning Networks, 8(4), (pp. 105-118).Search in Google Scholar

56. Yukselturk, E. (2009). Do Entry Characteristics of Online Learners Affect Their Satisfaction? In International Journal on E-Learning, 8(2), (pp. 263-281).Search in Google Scholar

57. Yukselturk, E. and Inan, F.A. (2006). Examining the Factors Affecting Student Dropout in an Online Certificate Program. In Turkish Online Journal of Distance Education-TOJDE, 7(3), Retrieved July 29, 2011, from http://tojde.anadolu.edu.tr/tojde23/pdf/article_6.pdfSearch in Google Scholar

58. Zang, W. and Lin, F. (2003). Investigation of web-based teaching and learning by boosting algorithms. In Proceedings of IEEE International Conference on Information Technology: Research and Education, 2003, (pp. 445-449).Search in Google Scholar

59. Zhao, C. and Luan, J. (2006). Data mining: Going beyond traditional statistics. In New Directions for Institutional Research, 131(2), (pp. 7-16). 10.1002/ir.184Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo