Number of Instances for Reliable Feature Ranking in a Given Problem

Background: In practical use of machine learning models, users may add new features to an existing classification model, reflecting their (changed) empirical understanding of a field. New features potentially increase classification accuracy of the model or improve its interpretability. Objectives: We have introduced a guideline for determination of the sample size needed to reliably estimate the impact of a new feature. Methods/Approach: Our approach is based on the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals for feature ranks. Results: We test our approach using real world qualitative business-tobusiness sales forecasting data and two UCI data sets, one with missing values. The results show that new features with a high or a low rank can be detected using a relatively small number of instances, but features ranked near the border of useful features need larger samples to determine their impact. Conclusions: A combination of the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals can be used to reliably estimate the impact of a new feature in a given problem

eISSN:: 1847-9375
Sprache:: Englisch

Zeitrahmen der Veröffentlichung:: 2 Hefte pro Jahr
Fachgebiete der Zeitschrift:: Wirtschaftswissenschaften, Betriebswirtschaft, Management, Organisation und Unternehmensführung, Grundsätze der Unternehmensführung, andere, Mathematik und Statistik für Ökonomen, Mathematik

Zeitschrift RSS Feed

Number of Instances for Reliable Feature Ranking in a Given Problem

Online veröffentlicht: 28. Juli 2018

Seitenbereich: 35 - 44

Eingereicht: 31. Jan. 2018

Akzeptiert: 21. Apr. 2018

DOI: https://doi.org/10.2478/bsrj-2018-0017

Schlüsselwörtermachine learning, feature ranking, feature evaluation

© 2018 Marko Bohanec, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Schlüsselwörter
machine learning, feature ranking, feature evaluation