1. bookVolume 20 (2021): Issue 2 (December 2021)
Journal Details
First Published
16 Apr 2016
Publication timeframe
2 times per year
access type Open Access

A Data Mining Approach to Predict Non-Contact Injuries in Young Soccer Players

Published Online: 28 Nov 2021
Volume & Issue: Volume 20 (2021) - Issue 2 (December 2021)
Page range: 147 - 163
Journal Details
First Published
16 Apr 2016
Publication timeframe
2 times per year

Predicting and avoiding an injury is a challenging task. By exploiting data mining techniques, this paper aims to identify existing relationships between modifiable and non-modifiable risk factors, with the final goal of predicting non-contact injuries. Twenty-three young soccer players were monitored during an entire season, with a total of fifty-seven non-contact injuries identified. Anthropometric data were collected, and the maturity offset was calculated for each player. To quantify internal training/match load and recovery status of the players, we daily employed the session-RPE method and the total quality recovery (TQR) scale. Cumulative workloads and the acute: chronic workload ratio (ACWR) were calculated. To explore the relationship between the various risk factors and the onset of non-contact injuries, we performed a classification tree analysis. The classification tree model exhibited an acceptable discrimination (AUC=0.76), after receiver operating characteristic curve (ROC) analysis. A low state of recovery, a rapid increase in the training load, cumulative workload, and maturity offset were recognized by the data mining algorithm as the most important injury risk factors.


Andrade, R., Wik, E. H., Rebelo-Marques, A., Blanch, P., Whiteley, R., Espregueira-Mendes, J., & Gabbett, T. J. (2020). Is the acute: Chronic workload ratio (ACWR) associated with risk of time-loss injury in professional team sports? A systematic review of methodology, variables and injury risk in practical situations. Sports medicine, 1–23.10.1007/s40279-020-01308-632572824 Search in Google Scholar

Ayala, F., López-Valenciano, A., Jose, A., De Ste Croix, M. B., Vera-García, F., García-Vaquero, M., … Myer, G. (2019). A preventive model for hamstring injuries in professional soccer: Learning algorithms. International journal of sports medicine, 40(5), 344–353.10.1055/a-0826-195530873572 Search in Google Scholar

Bacon, C. S., & Mauger, A. R. (2017). Prediction of overuse injuries in professional u18-u21 footballers using metrics of training distance and intensity. The Journal of Strength & Conditioning Research, 31(11), 3067–3076.10.1519/JSC.000000000000174427930446 Search in Google Scholar

Bahr, R., & Holme, I. (2003). Risk factors for sports injuries—A methodological approach. British journal of sports medicine, 37(5), 384–392.10.1136/bjsm.37.5.384175135714514527 Search in Google Scholar

Bhardwaj, B. K., & Pal, S. (2012). Data Mining: A prediction for performance improvement using classification. arXiv preprint arXiv:1201.3418. Search in Google Scholar

Bittencourt, N. F. N., Meeuwisse, W. H., Mendonça, L. D., Nettel-Aguirre, A., Ocarino, J. M., & Fonseca, S. T. (2016). Complex systems approach for sports injuries: Moving from risk factor identification to injury pattern recognition—Narrative review and new concept. British journal of sports medicine, 50(21), 1309–1314.10.1136/bjsports-2015-09585027445362 Search in Google Scholar

Bourdon, P. C., Cardinale, M., Murray, A., Gastin, P., Kellmann, M., Varley, M. C., … Gregson, W. (2017). Monitoring athlete training loads: Consensus statement. International journal of sports physiology and performance, 12(s2), S2-161-S2-170.10.1123/IJSPP.2017-020828463642 Search in Google Scholar

Bowen, L., Gross, A. S., Gimpel, M., & Li, F.-X. (2017). Accumulated workloads and the acute: Chronic workload ratio relate to injury risk in elite youth football players. British journal of sports medicine, 51(5), 452–459.10.1136/bjsports-2015-095820546066327450360 Search in Google Scholar

Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A. (1984). Classification and regression trees. CRC press. Search in Google Scholar

Brink, M. S., Visscher, C., Arends, S., Zwerver, J., Post, W. J., & Lemmink, K. A. (2010). Monitoring stress and recovery: New insights for the prevention of injuries and illnesses in elite youth soccer players. British journal of sports medicine, 44(11), 809–815.10.1136/bjsm.2009.06947620511621 Search in Google Scholar

Bult, H. J., Barendrecht, M., & Tak, I. J. R. (2018). Injury risk and injury burden are related to age group and peak height velocity among talented male youth soccer players. Orthopaedic journal of sports medicine, 6(12), 2325967118811042.10.1177/2325967118811042629337430560140 Search in Google Scholar

Carey, D. L., Ong, K., Whiteley, R., Crossley, K. M., Crow, J., & Morris, M. E. (2018). Predictive modelling of training loads and injury in Australian football. International Journal of Computer Science in Sport, 17(1), 49–66.10.2478/ijcss-2018-0002 Search in Google Scholar

Chandrashekar, G., & Sahin, F. (2014). A survey on feature selection methods. Computers & Electrical Engineering, 40(1), 16–28.10.1016/j.compeleceng.2013.11.024 Search in Google Scholar

Chawla, N. V. (2005). Data Mining for Imbalanced Datasets: An Overview. In O. Maimon & L. Rokach (A c. Di), Data Mining and Knowledge Discovery Handbook (pagg. 853–867). Boston, MA: Springer US. https://doi.org/10.1007/0-387-25465-X_4010.1007/0-387-25465-X_40 Search in Google Scholar

Cima, G. (2017). Preliminary results on ontology-based open data publishing. In A. Artale, B. Glimm, & R. Kontchakov (A c. Di), Proceedings of the 30th international workshop on description logics, montpellier, france, july 18-21, 2017. CEUR-WS.org. Recuperato da http://ceur-ws.org/Vol-1879/paper24.pdf Search in Google Scholar

Cima, G., Lenzerini, M., & Poggi, A. (2017). Semantic technology for open data publishing. Proceedings of the 7th International Conference on Web Intelligence, Mining and Semantics, 1–1.10.1145/3102254.3102255 Search in Google Scholar

Cortez, P., & Embrechts, M. J. (2013). Using sensitivity analysis and visualization techniques to open black box data mining models. Information Sciences, 225, 1–17.10.1016/j.ins.2012.10.039 Search in Google Scholar

De Ridder, R., Witvrouw, E., Dolphens, M., Roosen, P., & Van Ginckel, A. (2017). Hip strength as an intrinsic risk factor for lateral ankle sprains in youth soccer players: A 3-season prospective study. The American journal of sports medicine, 45(2), 410–416.10.1177/036354651667265027852594 Search in Google Scholar

Delecroix, B., Mccall, A., Dawson, B., Berthoin, S., & Dupont, G. (2019). Workload monotony, strain and non-contact injury incidence in professional football players. Science and Medicine in Football, 3(2), 105–108.10.1080/24733938.2018.1508881 Search in Google Scholar

Fanchini, M., Rampinini, E., Riggio, M., Coutts, A. J., Pecci, C., & McCall, A. (2018). Despite association, the acute: Chronic work load ratio does not predict non-contact injury in elite footballers. Science and Medicine in Football, 2(2), 108–114.10.1080/24733938.2018.1429014 Search in Google Scholar

Foster, C. (1998). Monitoring training in athletes with reference to overtraining syndrome. Medicine and Science in Sports and Exercise, 30(7), 1164–1168. https://doi.org/10.1097/00005768-199807000-0002310.1097/00005768-199807000-000239662690 Search in Google Scholar

Foster, C., Florhaug, J. A., Franklin, J., Gottschall, L., Hrovatin, L. A., Parker, S., … Dodge, C. (2001). A new approach to monitoring exercise training. The Journal of Strength & Conditioning Research, 15(1), 109–115. Search in Google Scholar

Fuller, C. W., Ekstrand, J., Junge, A., Andersen, T. E., Bahr, R., Dvorak, J., … Meeuwisse, W. H. (2006). Consensus statement on injury definitions and data collection procedures in studies of football (soccer) injuries. Scandinavian journal of medicine & science in sports, 16(2), 83–92.10.1111/j.1600-0838.2006.00528.x16533346 Search in Google Scholar

Gabbett, T. J. (2016). The training—Injury prevention paradox: Should athletes be training smarter and harder? British journal of sports medicine, 50(5), 273–280.10.1136/bjsports-2015-095788478970426758673 Search in Google Scholar

Gjaka, M., Tschan, H., Francioni, F. M., Tishkuaj, F., & Tessitore, A. (2016). MONITORING OF LOADS AND RECOVERY PERCEIVED DURING WEEKS WITH DIFFERENT SCHEDULE IN YOUNG SOCCER PLAYERS. Kinesiologia Slovenica, 22(1). Search in Google Scholar

Hosmer Jr, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression (Vol. 398). John Wiley & Sons. Search in Google Scholar

Hulin, B. T., Gabbett, T. J., Blanch, P., Chapman, P., Bailey, D., & Orchard, J. W. (2014). Spikes in acute workload are associated with increased injury risk in elite cricket fast bowlers. British journal of sports medicine, 48(8), 708–712.10.1136/bjsports-2013-09252423962877 Search in Google Scholar

Impellizzeri, F. M., Rampinini, E., Coutts, A. J., Sassi, A., & Marcora, S. M. (2004). Use of RPE-based training load in soccer. Medicine & Science in sports & exercise, 36(6), 1042–1047.10.1249/01.MSS.0000128199.23901.2F Search in Google Scholar

Impellizzeri, F. M., Woodcock, S., Coutts, A. J., Fanchini, M., McCall, A., & Vigotsky, A. D. (2021). What Role Do Chronic Workloads Play in the Acute to Chronic Workload Ratio? Time to Dismiss ACWR and Its Underlying Theory. Sports Medicine, 51(3), 581–592. https://doi.org/10.1007/s40279-020-01378-610.1007/s40279-020-01378-633332011 Search in Google Scholar

Jaspers, A., Kuyvenhoven, J. P., Staes, F., Frencken, W. G., Helsen, W. F., & Brink, M. S. (2018). Examination of the external and internal load indicators’ association with overuse injuries in professional soccer players. Journal of science and medicine in sport, 21(6), 579–585.10.1016/j.jsams.2017.10.00529079295 Search in Google Scholar

Johnson, D. M., Williams, S., Bradley, B., Sayer, S., Murray Fisher, J., & Cumming, S. (2020). Growing pains: Maturity associated variation in injury risk in academy football. European journal of sport science, 20(4), 544–552.10.1080/17461391.2019.163341631215359 Search in Google Scholar

Johnson, L. L., Borkowf, C., & Albert, P. (2007). An Introduction to Biostatistics: Randomization, Hypothesis Testing, and Sample Size Estimation.10.1016/B978-012369440-9/50019-0 Search in Google Scholar

Kenttä, G., & Hassmén, P. (1998). Overtraining and recovery. Sports medicine, 26(1), 1–16.10.2165/00007256-199826010-000019739537 Search in Google Scholar

Ko, J., Rosen, A. B., & Brown, C. N. (2018). Functional performance tests identify lateral ankle sprain risk: A prospective pilot study in adolescent soccer players. Scandinavian Journal of Medicine & Science in Sports, 28(12), 2611–2616.10.1111/sms.1327930120831 Search in Google Scholar

Kofotolis, N. (2014). Ankle sprain injuries in soccer players aged 7-15 years during a one-year season. Biology of exercise, 10(2).10.4127/jbe.2014.0077 Search in Google Scholar

Kuhn, M., & Johnson, K. (2013). Applied predictive modeling (Vol. 26). Springer. Search in Google Scholar

Malina, R. M., Bouchard, C., & Bar-Or, O. (2004). Growth, maturation, and physical activity. Human kinetics.10.5040/9781492596837 Search in Google Scholar

Malone, S., Owen, A., Newton, M., Mendes, B., Collins, K. D., & Gabbett, T. J. (2017). The acute: Chonic workload ratio in relation to injury risk in professional soccer. Journal of science and medicine in sport, 20(6), 561–565.10.1016/j.jsams.2016.10.01427856198 Search in Google Scholar

Marshall, D. A., Lopatina, E., Lacny, S., & Emery, C. A. (2016). Economic impact study: Neuromuscular training reduces the burden of injuries and costs compared to standard warm-up in youth soccer. British journal of sports medicine, 50(22), 1388–1393.10.1136/bjsports-2015-09566627034127 Search in Google Scholar

McCall, A., Dupont, G., & Ekstrand, J. (2016). Injury prevention strategies, coach compliance and player adherence of 33 of the UEFA Elite Club Injury Study teams: A survey of teams’ head medical officers. British journal of sports medicine, 50(12), 725–730.10.1136/bjsports-2015-09525926795611 Search in Google Scholar

McCall, A., Dupont, G., & Ekstrand, J. (2018). Internal workload and non-contact injury: A one-season study of five teams from the UEFA Elite Club Injury Study. British journal of sports medicine, 52(23), 1517–1522.10.1136/bjsports-2017-09847329626055 Search in Google Scholar

Meeuwisse, W. H., Tyreman, H., Hagel, B., & Emery, C. (2007). A dynamic model of etiology in sport injury: The recursive nature of risk and causation. Clinical Journal of Sport Medicine, 17(3), 215–219.10.1097/JSM.0b013e3180592a4817513916 Search in Google Scholar

Mirwald, R. L., Baxter-Jones, A. D., Bailey, D. A., & BEUNEN, G. P. (2002). An assessment of maturity from anthropometric measurements. Medicine & science in sports & exercise, 34(4), 689–694. Search in Google Scholar

Montella, A., de Oña, R., Mauriello, F., Riccardi, M. R., & Silvestro, G. (2020). A data mining approach to investigate patterns of powered two-wheeler crashes in Spain. Accident Analysis & Prevention, 134, 105251.10.1016/j.aap.2019.07.02731402051 Search in Google Scholar

Oliver, J. L., Ayala, F., Croix, M. B. D. S., Lloyd, R. S., Myer, G. D., & Read, P. J. (2020). Using machine learning to improve our understanding of injury risk and prediction in elite male youth football players. Journal of science and medicine in sport, 23(11), 1044–1048.10.1016/j.jsams.2020.04.02132482610 Search in Google Scholar

Petticrew, M. P., Sowden, A. J., Lister-Sharp, D., & Wright, K. (2000). False-negative results in screening programmes: Systematic review of impact and implications. Health technology assessment (Winchester, England), 4(5), 1–120.10.3310/hta4050 Search in Google Scholar

Philippaerts, R. M., Vaeyens, R., Janssens, M., Van Renterghem, B., Matthys, D., Craen, R., … Malina, R. M. (2006). The relationship between peak height velocity and physical performance in youth soccer players. Journal of sports sciences, 24(3), 221–230.10.1080/0264041050018937116368632 Search in Google Scholar

Polinder, S., Haagsma, J., Panneman, M., Scholten, A., Brugmans, M., & Van Beeck, E. (2016). The economic burden of injury: Health care and productivity costs of injuries in the Netherlands. Accident Analysis & Prevention, 93, 92–100.10.1016/j.aap.2016.04.00327177394 Search in Google Scholar

Read, P. J., Oliver, J. L., De Ste Croix, M. B. A., Myer, G. D., & Lloyd, R. S. (2018). A prospective investigation to evaluate risk factors for lower extremity injury risk in male youth soccer players. Scandinavian journal of medicine & science in sports, 28(3), 1244–1251.10.1111/sms.13013655676929130575 Search in Google Scholar

Richardson, A., Clarsen, B., Verhagen, E., & Stubbe, J. H. (2017). High prevalence of self-reported injuries and illnesses in talented female athletes. BMJ open sport & exercise medicine, 3(1), e000199.10.1136/bmjsem-2016-000199553025828761701 Search in Google Scholar

Rommers, N., Rössler, R., Verhagen, E., Vandecasteele, F., Verstockt, S., Vaeyens, R., … Witvrouw, E. (2020). A machine learning approach to assess injury risk in elite youth football players. Medicine and science in sports and exercise, 52(8), 1745–1751.10.1249/MSS.000000000000230532079917 Search in Google Scholar

Rossi, A., Pappalardo, L., Cintia, P., Iaia, F. M., Fernández, J., & Medina, D. (2018). Effective injury forecasting in soccer with GPS training data and machine learning. PloS one, 13(7), e0201264.10.1371/journal.pone.0201264605946030044858 Search in Google Scholar

Ruddy, J., Shield, A., Maniar, N., Williams, M., Duhig, S., Timmins, R., … Opar, D. (2018). Predictive modeling of hamstring strain injuries in elite Australian footballers. Medicine and science in sports and exercise, 50(5), 906–914.10.1249/MSS.000000000000152729266094 Search in Google Scholar

Sansone, P., Tschan, H., Foster, C., & Tessitore, A. (2020). Monitoring training load and perceived recovery in female basketball: Implications for training design. The Journal of Strength & Conditioning Research.10.1519/JSC.000000000000297130589724 Search in Google Scholar

Seshadri, D. R., Thom, M. L., Harlow, E. R., Gabbett, T. J., Geletka, B. J., Hsu, J. J., … Voos, J. E. (2021). Wearable technology and analytics as a complementary toolkit to optimize workload and to reduce injury burden. Frontiers in sports and active living, 2, 228.10.3389/fspor.2020.630576785963933554111 Search in Google Scholar

Singh, D., & Singh, B. (2020). Investigating the impact of data normalization on classification performance. Applied Soft Computing, 97, 105524.10.1016/j.asoc.2019.105524 Search in Google Scholar

Singh, S., & Gupta, P. (2014). Comparative study ID3, cart and C4. 5 decision tree algorithm: A survey. International Journal of Advanced Information Science and Technology (IJAIST), 27(27), 97–103. Search in Google Scholar

Timpka, T., Risto, O., & Björmsjö, M. (2008). Boys soccer league injuries: A community-based study of time-loss from sports participation and long-term sequelae. European journal of public health, 18(1), 19–24.10.1093/eurpub/ckm05017569703 Search in Google Scholar

Towlson, C., Salter, J., Ade, J. D., Enright, K., Harper, L. D., Page, R. M., & Malone, J. J. (2020). Maturity-associated considerations for training load, injury risk, and physical performance within youth soccer: One size does not fit all. Journal of Sport and Health Science. Search in Google Scholar

Vallance, E., Sutton-Charani, N., Imoussaten, A., Montmain, J., & Perrey, S. (2020). Combining Internal-and External-Training-Loads to Predict Non-Contact Injuries in Soccer. Applied Sciences, 10(15), 5261.10.3390/app10155261 Search in Google Scholar

van der Sluis, A., Elferink-Gemser, M. T., Coelho-e-Silva, M. J., Nijboer, J. A., Brink, M. S., & Visscher, C. (2014). Sport injuries aligned to peak height velocity in talented pubertal soccer players. International journal of sports medicine, 35(04), 351–355. Search in Google Scholar

Vänttinen, T., Blomqvist, M., Nyman, K., & Häkkinen, K. (2011). Changes in body composition, hormonal status, and physical fitness in 11-, 13-, and 15-year-old Finnish regional youth soccer players during a two-year follow-up. The Journal of Strength & Conditioning Research, 25(12), 3342–3351.10.1519/JSC.0b013e318236d0c221921822 Search in Google Scholar

Venturelli, M., Schena, F., Zanolla, L., & Bishop, D. (2011). Injury risk factors in young soccer players detected by a multivariate survival model. Journal of science and medicine in sport, 14(4), 293–298.10.1016/j.jsams.2011.02.01321474378 Search in Google Scholar

Wang, C., Stokes, T., Steele, R., Wedderkopp, N., & Shrier, I. (2020). Injury risk increases minimally over a large range of the acute: Chronic workload ratio in children. arXiv preprint arXiv:2010.02952. Search in Google Scholar

Watson, A., Brickson, S., Brooks, A., & Dunn, W. (2017). Subjective well-being and training load predict in-season injury and illness risk in female youth soccer players. British journal of sports medicine, 51(3), 194–199.10.1136/bjsports-2016-09658427919919 Search in Google Scholar

Windt, J., & Gabbett, T. J. (2017). How do training and competition workloads relate to injury? The workload—Injury aetiology model. British Journal of Sports Medicine, 51(5), 428–435.10.1136/bjsports-2016-09604027418321 Search in Google Scholar

Winter, E. M., & Maughan, R. J. (2009). Requirements for ethics approvals. Journal of sports sciences, 27(10), 985.10.1080/0264041090317834419847681 Search in Google Scholar

Zouhal, H., Boullosa, D., Ramirez-Campillo, R., Ali, A., & Granacher, U. (2021). Acute: Chronic Workload Ratio: Is There Scientific Evidence? Frontiers in Physiology, 12.10.3389/fphys.2021.669687813856934025457 Search in Google Scholar

Recommended articles from Trend MD

Plan your remote conference with Sciendo