A Rebalancing Framework for Classification of Imbalanced Medical Appointment No-show Data

This paper aims to improve the classification performance when the data is imbalanced by applying different sampling techniques available in Machine Learning.

Design/methodology/approach

The medical appointment no-show dataset is imbalanced, and when classification algorithms are applied directly to the dataset, it is biased towards the majority class, ignoring the minority class. To avoid this issue, multiple sampling techniques such as Random Over Sampling (ROS), Random Under Sampling (RUS), Synthetic Minority Oversampling TEchnique (SMOTE), ADAptive SYNthetic Sampling (ADASYN), Edited Nearest Neighbor (ENN), and Condensed Nearest Neighbor (CNN) are applied in order to make the dataset balanced. The performance is assessed by the Decision Tree classifier with the listed sampling techniques and the best performance is identified.

Findings

This study focuses on the comparison of the performance metrics of various sampling methods widely used. It is revealed that, compared to other techniques, the Recall is high when ENN is applied CNN and ADASYN have performed equally well on the Imbalanced data.

Research limitations

The testing was carried out with limited dataset and needs to be tested with a larger dataset.

Practical implications

This framework will be useful whenever the data is imbalanced in real world scenarios, which ultimately improves the performance.

Originality/value

This paper uses the rebalancing framework on medical appointment no-show dataset to predict the no-shows and removes the bias towards minority class.

eISSN:: 2543-683X
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 4 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Informatik, Informationstechnik, Projektmanagement, Datanbanken und Data Mining

Zeitschrift RSS Feed

A Rebalancing Framework for Classification of Imbalanced Medical Appointment No-show Data

Article Category: Research Paper

Online veröffentlicht: 27. Jan. 2021

Seitenbereich: 178 - 192

Eingereicht: 29. Apr. 2020

Akzeptiert: 21. Dez. 2020

DOI: https://doi.org/10.2478/jdis-2021-0011

SchlüsselwörterImbalanced data, Sampling methods, Machine learning, Classification

© 2021 Ulagapriya Krishnan et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Purpose

Design/methodology/approach

Findings

Research limitations

Practical implications

Originality/value

Schlüsselwörter
Imbalanced data, Sampling methods, Machine learning, Classification