Number of Instances for Reliable Feature Ranking in a Given Problem

Background: In practical use of machine learning models, users may add new features to an existing classification model, reflecting their (changed) empirical understanding of a field. New features potentially increase classification accuracy of the model or improve its interpretability. Objectives: We have introduced a guideline for determination of the sample size needed to reliably estimate the impact of a new feature. Methods/Approach: Our approach is based on the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals for feature ranks. Results: We test our approach using real world qualitative business-tobusiness sales forecasting data and two UCI data sets, one with missing values. The results show that new features with a high or a low rank can be detected using a relatively small number of instances, but features ranked near the border of useful features need larger samples to determine their impact. Conclusions: A combination of the feature evaluation measure ReliefF and the bootstrap-based estimation of confidence intervals can be used to reliably estimate the impact of a new feature in a given problem

Langue:: Anglais

Périodicité:: 2 fois par an
Sujets de la revue:: Affaires et économie, Gestion d'entreprise, Management, organisation, gouvernance d'entreprise, Gestion d’entreprise, autres, Mathématiques et statistiques pour les économistes, Mathématiques

RSS Feed de la revue

Number of Instances for Reliable Feature Ranking in a Given Problem

Marko Bohanec

Mirjana Kljajić Borštnar

Marko Robnik-Šikonja

Publié en ligne: 28 juil. 2018

Pages: 35 - 44

Reçu: 31 janv. 2018

Accepté: 21 avr. 2018

DOI: https://doi.org/10.2478/bsrj-2018-0017

Mots clésmachine learning, feature ranking, feature evaluation

© 2018 Marko Bohanec, published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Mots clés
machine learning, feature ranking, feature evaluation