1. bookVolume 22 (2022): Edizione 2 (June 2022)
Dettagli della rivista
License
Formato
Rivista
eISSN
1314-4081
Prima pubblicazione
13 Mar 2012
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
access type Accesso libero

Enhancing Weak Nodes in Decision Tree Algorithm Using Data Augmentation

Pubblicato online: 23 Jun 2022
Volume & Edizione: Volume 22 (2022) - Edizione 2 (June 2022)
Pagine: 50 - 65
Ricevuto: 14 Mar 2021
Accettato: 21 Apr 2022
Dettagli della rivista
License
Formato
Rivista
eISSN
1314-4081
Prima pubblicazione
13 Mar 2012
Frequenza di pubblicazione
4 volte all'anno
Lingue
Inglese
Abstract

Decision trees are among the most popular classifiers in machine learning, artificial intelligence, and pattern recognition because they are accurate and easy to interpret. During the tree construction, a node containing too few observations (weak node) could still get split, and then the resulted split is unreliable and statistically has no value. Many existing machine-learning methods can resolve this issue, such as pruning, which removes the tree’s non-meaningful parts. This paper deals with the weak nodes differently; we introduce a new algorithm Enhancing Weak Nodes in Decision Tree (EWNDT), which reinforces them by increasing their data from other similar tree nodes. We called the data augmentation a virtual merging because we temporarily recalculate the best splitting attribute and the best threshold in the weak node. We have used two approaches to defining the similarity between two nodes. The experimental results are verified using benchmark datasets from the UCI machine-learning repository. The results indicate that the EWNDT algorithm gives a good performance.

Keywords

1. Breiman, L., Je. Friedman, C. J. Stone, R. A. Olshen. Classification and Regression Trees. CRC Press, 1984. Search in Google Scholar

2. Joost de Nijs. Decision Dags – a New Approach. Drown University, 1999. Search in Google Scholar

3. Hu, D., Q. Liu, Q. Yan. Decision Tree Merging Branches Algorithm Based on Equal Predictability. – In: Proc. of International Conference on Artificial Intelligence and Computational Intelligence, Vol. 3, 2009, pp. 214-218.10.1109/AICI.2009.80 Search in Google Scholar

4. Ignatov, D., A. Ignatov. Decision Stream: Cultivating Deep Decision Trees. – In: Proc. of 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’17), 2017, pp. 905-912. Search in Google Scholar

5. Gordon, V. Kass. An Exploratory Technique for Investigating Large Quantities of Categorical Data. – Journal of the Royal Statistical Society: Series C (Applied Statistics), Vol. 29, 1980, No 2, pp. 119-127.10.2307/2986296 Search in Google Scholar

6. Loh, Wei-Yin, Yu-Shan Shih. Split Selection Methods for Classification Trees. Sinica Statistica, 1997, pp. 815-840. Search in Google Scholar

7. Gudapati, P., M. Mahmood, V. Kavuluru, M. Kuppa. A New Pruning Approach for Better and Compact Decision Trees. – International Journal on Computer Science and Engineering, Vol. 2, 2010, pp. 2551-2558. Search in Google Scholar

8. Pfahringer, B., G. Holmes, R. Kirkby. New Options for Hoeffding Trees. – In: Proc. of Australasian Joint Conference on Artificial Intelligence, Springer, 2007, pp. 90-99.10.1007/978-3-540-76928-6_11 Search in Google Scholar

9. Ross Quinlan, J. C 4.5: Programs for Machine Learning. 1993. Search in Google Scholar

10. Tan, P. J., D. L. Dowe. MML Inference of Decision Graphs with Multi-Way Joins. – In: Proc. of Australian Joint Conference on Artificial Intelligence, 2002, pp. 131-142.10.1007/3-540-36187-1_12 Search in Google Scholar

11. Uther, W. T. B., M. M. Veloso. The Lumberjack Algorithm for Learning Linked Decision Forests. – In: Proc. of International Symposium on Abstraction, Reformulation, and Approximation, 2000, pp. 219-232.10.1007/3-540-44914-0_13 Search in Google Scholar

12. Wu, Chia-Chi, Yen-Liang Chen, Yi-Hung Liu, Xiang-Yu Yang. Decision Tree Induction with a Constrained Number of Leaf Nodes. – Applied Intelligence, Vol. 45, 2016, No 3, pp. 673-685.10.1007/s10489-016-0785-z Search in Google Scholar

13. Wu, X., B. Shi. New Algorithm of Simplifying the ID3 Decision Tree. – Journal of Hefei University of Technology, Vol. 27, 2004, pp. 1565-1569. Search in Google Scholar

14. Yang, C., X. Wang, R. Zhu. A Strategy of Merging Branches Based on Margin Enlargement of SVM in Decision Tree Induction. – In: Proc. of IEEE International Conference on Systems, Man and Cybernetics, Vol. 1, 2006, pp. 824-828.10.1109/ICSMC.2006.384490 Search in Google Scholar

15. Yang, S., H. Fong. Incrementally Optimized Decision Tree for Noisy Big Data. – In: Proc. of 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, 2012, pp. 36-44. Search in Google Scholar

16. Breiman, L. Random Forests. – Machine Learning, Vol. 45, 2001, No 1, pp. 5-32.10.1023/A:1010933404324 Search in Google Scholar

17. Cortes, C., V. Vapnik. Support Vector Machine. – Machine Learning, Vol. 20, 1995, No 3, pp. 273-297.10.1007/BF00994018 Search in Google Scholar

18. Keller, J. M., M. R. Gray, J. A. Givens. A Fuzzy k-Nearest Neighbour Algorithm. – IEEE Transactions on Systems, Man, and Cybernetics, 1985, No 4, pp. 580-585.10.1109/TSMC.1985.6313426 Search in Google Scholar

19. Wright, R. E. Logistic Regression. – In: L. G. Grim, P. R. Yarnolol, Eds. Reading and Understanding Multivariate Statistics, 1995, pp. 217-244. Search in Google Scholar

20. Stork, D. G., R. O. Duda, P. E. Hart, et al. Pattern Classification. Wiley-Inter Science Publication, 2001. Search in Google Scholar

21. Zhang, C., C. Liu, X. Zhang, G. Almpanidis. An Up-to-Date Comparison of State-of-the-Art Classification Algorithms. – Expert Systems with Applications, Vol. 82, 2017, pp. 128-150.10.1016/j.eswa.2017.04.003 Search in Google Scholar

22. Singh, S., P. Gupta. Comparative Study ID3, Cart, and C4.5 Decision Tree Algorithm: A Survey. – International Journal of Advanced Information Science and Technology (IJAIST), Vol. 27, 2014, No 27, pp. 97-103. Search in Google Scholar

23. Da Costa, V. G. T., A. C. P. de Leon Ferreira, S. B. Junior et al. Strict Very Fast Decision Tree: A Memory Conservative Algorithm for Data Stream Mining. – Pattern Recognition Letters, Vol. 116, 2018, pp. 22-28.10.1016/j.patrec.2018.09.004 Search in Google Scholar

24. García-Martín, E., N. Lavesson, H. Grahn et al. Energy-Aware Very Fast Decision Tree. – International Journal of Data Science and Analytics, Vol. 11, 2021, No 2, pp. 105-126.10.1007/s41060-021-00246-4 Search in Google Scholar

25. Ganaie, M. A., M. Tanveer, P. N. Suganthan. Oblique Decision Tree Ensemble via Twin Bounded SVM. – Expert Systems with Applications, Vol. 143, 2020, p. 113072.10.1016/j.eswa.2019.113072 Search in Google Scholar

26. Yang, H., S. Fong. Incremental Optimization Mechanism for Constructing a Decision Tree in Data Stream Mining. – Mathematical Problems in Engineering, Vol. 2013, 2013.10.1155/2013/580397 Search in Google Scholar

27. Luna, J. M., E. D. Gennatas, L. H. Ungar et al. Building More Accurate Decision Trees with the Additive Tree. – Proceedings of the National Academy of Sciences, Vol. 116, 2019, No 40, pp. 19887-19893.10.1073/pnas.1816748116677820331527280 Search in Google Scholar

Articoli consigliati da Trend MD

Pianifica la tua conferenza remota con Sciendo