A Two-Level Approach based on Integration of Bagging and Voting for Outlier Detection
Catégorie d'article: Research Paper
Publié en ligne: 20 mai 2020
Pages: 111 - 135
Reçu: 13 déc. 2019
Accepté: 29 avr. 2020
DOI: https://doi.org/10.2478/jdis-2020-0014
Mots clés
© 2020 Alican Dogan et al., published by Sciendo
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.
Purpose
The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.
Design/methodology/approach
This paper proposes a novel two-level approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BV-LOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.
Findings
Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BV-LOF method. According to the results, the BV-LOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.
Research limitations
In the BV-LOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (
Practical implications
The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BV-LOF method includes two-level ensemble operations, it may lead to more computational time than single-level ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*-tree or KD-tree.
Originality/value
The proposed approach (BV-LOF) investigates multiple neighborhood sizes (