The main aim of this study is to build a robust novel approach that is able to detect outliers in the datasets accurately. To serve this purpose, a novel approach is introduced to determine the likelihood of an object to be extremely different from the general behavior of the entire dataset.
This paper proposes a novel twolevel approach based on the integration of bagging and voting techniques for anomaly detection problems. The proposed approach, named Bagged and Voted Local Outlier Detection (BVLOF), benefits from the Local Outlier Factor (LOF) as the base algorithm and improves its detection rate by using ensemble methods.
Several experiments have been performed on ten benchmark outlier detection datasets to demonstrate the effectiveness of the BVLOF method. According to the results, the BVLOF approach significantly outperformed LOF on 9 datasets of 10 ones on average.
In the BVLOF approach, the base algorithm is applied to each subset data multiple times with different neighborhood sizes (
The proposed method can be applied to the datasets from different domains (i.e. health, finance, manufacturing, etc.) without requiring any prior information. Since the BVLOF method includes twolevel ensemble operations, it may lead to more computational time than singlelevel ensemble methods; however, this drawback can be overcome by parallelization and by using a proper data structure such as R*tree or KDtree.
The proposed approach (BVLOF) investigates multiple neighborhood sizes (
Keywords
 Outlier detection
 Local outlier factor
 Ensemble learning
 Bagging
 Voting
In recent years, various studies require data mining that can find outliers and protect system reliability. Outliers are observations that differ significantly from most of the samples in the dataset. They could be perceived as though they are generated by a different process. Generally, the number of outliers is comparably much less than the normal behaving data (typically lower than 10%). Data instances containing extreme feature values out of the accepted range may cause a negative effect on data analyses such as regression or may provide useful information about data such as terrorist attacks. Outlier analysis is especially adequate for fraud detection which is performed by an
If any data point is extremely different from the whole data in a process, then it is marked as an outlier (Qi & Chen, 2018). Outliers may emerge because of various causes such as malicious attacks, environmental factors, human errors, abnormal conditions, measurement errors, and hardware malfunction. Outlier detection is related but slightly different from novelty detection, which is concerned with the identification of new or unknown data not covered by the training data. Outlier detection makes the model adapt for normal behaving data, thus abnormally behaving data could be detected by the built model since it does not fit the conditioned model. Local Outlier Factor (LOF) and Isolation Forest (IF) are some of the soft computing methods dealing with this problem (Domingues et al., 2018).
However, it is not always straightforward to perform anomaly detection successfully for many methods (Cao et al., 2017). Sampling approaches are not generally used for this task since they are affected by oversampling minority values or undersampling majority values. Detecting an outlier in any context is a challenging task due to the lack of a known underlying model and the strong dependence of the existing methods on input parameters. In order to overcome these problems, in this study, we propose a new ensemblebased outlier detection approach, named Bagged and Voted Local Outlier Detection (BVLOF). Our approach uses the LOF algorithm as a base detector on various subsets of data and aggregates the obtained results by a combining mechanism.
The main contributions of this paper are threefold: (i) It proposes a novel approach (BVLOF), which integrates two different styles of ensemble learning (bagging and voting) using the Local Outlier Factor (LOF) algorithm in order to detect outliers. In this way, it is aimed at improving the general performance of LOF on detecting anomalies. (ii) This is the first study that runs the LOF algorithm with different input parameter
In the experimental studies, ten benchmark outlier datasets were used and the results were compared with the ones obtained by using the LOF method. Area Under the Receiver Operating Characteristics Curve (AUC) measure was used to evaluate the outlier detection performance of the proposed method for all datasets. According to the experimental results, the proposed twolevel approach (BVLOF) demonstrated better performance compared to LOF.
The rest of this paper is organized as follows. Section 2 describes outlier detection studies that have been conducted by researchers so far. Section 3 explains the proposed method (BVLOF) in detail with its benefits and time complexity. Section 4 presents the datasets used in the experiments and discusses the experimental results obtained by the comparison of two methods: LOF and BVLOF. Final comments on the study and possible future works are presented in Section 5.
In the majority of the cases, the number of instances that belong to the class of outliers is less than 10% of the total number of instances in the whole training set (Wu et al., 2018). Their identification becomes one of the most challenging situations in data analysis, hence, in general, the probabilities of detecting an anomaly in a dataset are extremely low for many methods. The studies presented in the literature have shown that many traditional classification, clustering, and regression algorithms are not capable of dealing with outliers. The reason behind this situation is that learning algorithms are likely to misinterpret the data containing outliers. For this reason, the proposed approach in this paper can be applied as a separate data preprocessing step before classification, regression, and clustering tasks.
In the literature, many different outlier detection algorithms have been proposed to observe the presence of outliers effectively. A group of algorithms uses the reconstruction error to find contaminations (Balamurali & Melkumvan, 2018), in which the data that have greater reconstruction errors are labeled as outliers. These algorithms assume that strong statistical correlations exist among the attributes of normally behaving data. These reconstructionbased techniques can capture the correlations by reducing the reconstruction errors of data having the expected behavior. For this reason, the outliers have comparatively larger reconstruction errors than normal data.
Different assumptions have been made by the existing outlier detection approaches. These assumptions vary and make all the approaches differ from each other. Densitybased and distancebased techniques are generally the source of most of the complex algorithms (Lopes et al., 2018).
As shown in Figure 1, the learning scenarios proposed by the current outlier detection methods can be grouped into three categories: supervised, semisupervised, and unsupervised. In supervised outlier detection, classlabeled data tagged as inlier and outlier is provided for training. So, distinct classes of data are available like in binary or multiclass classification (Huang et al., 2012). Semisupervised outlier detection can utilize labeled instances for only inlier class or can use both labeled data (tagged as inlier and outlier) and unlabeled data (maybe or may not be an outlier). In unsupervised outlier detection, the algorithm tries to learn from only unlabeled data (PasillasDiaz & Ratte, 2017), so it has no information about class labels. In this study, we proposed a novel outlier detection approach that focuses on unsupervised learning, so only data without label or class exist.
Unsupervised outlier detection approaches have also subcategories: clusteringbased, densitybased, and relative densitybased methods.
First, clusteringbased techniques have the strategy that each instance either belongs to a cluster or not (called as an outlier). In these types of algorithms, data is split into clusters, and being a member of a cluster determines whether the instance is an anomaly or not. A binary decision is taken by the model. Therefore, it does not provide an extensive comprehension of the identified outliers.
Secondly, densitybased techniques identify instances found in the areas with low density and named them as anomalies. An anomaly score is given to each instance based on the nearest neighbor (NN) distance strategy. Different densitybased outlier detection methods have been proposed by researchers so far. Local Outlier Factor (LOF) is one of the most popular densitybased methods that use NN methodology. For this reason, in this study, we preferred to use the LOF method because of its advantages such as being an unsupervised algorithm, easy to understand and implement, dealing with high dimensional data, and requiring no assumption of the underlying data.
Thirdly, in relative densitybased techniques, anomalies are defined as observations that have lower relative density than its neighbors (Bandaragoda et al., 2017). This technique is proposed since the global density function causes the problem of limited identified local outliers in dense areas having low relative density to its neighbors. The ratio of density between an instance and its neighborhood is taken as a measure of relative density. In this way, instances with low density are labeled as outliers.
Several variants of the LOF algorithm have been implemented in order to overcome a disadvantage that the researchers observed in the original LOF. For example, Qin et al. (2019) proposed Weighted LOF (WLOF) to deal with the impact of data duplication. Gan and Zhou (2018) proposed the DLOF approach, which uses the DBSCAN algorithm to optimize the LOF algorithm by dynamically adjusting the parameter
Ensemble learning is a supported mechanism in outlier detection tasks. Ensemblebased models utilize an outlier detection algorithm or a series of different algorithms more than once on various settings of the same dataset and combine the outputs to get the final anomaly score (Chakraborty, Narayanan, & Ghosh, 2019; Zhang et al., 2019). In the literature, several studies have been performed by using an ensemble strategy for this purpose (Chen et al., 2018; Hu et al., 2019; Li, Fang, & Yan, 2019; Wang & Mao, 2019). The previous studies have proven that ensemble learning yields advantages in many different applications such as process monitoring (Wang & Mao, 2019), video anomaly detection (Hu et al. 2019), maintenance prediction (Li, Fang & Yan, 2019) and imagebased outlier detection (Chen et al., 2018). This is because, they try to minimize errors of judgment made by different models or differently selected subsets of data (Kaneko, 2018).
Lazarevic and Kumar (2005) proposed a feature bagging mechanism to create different datasets with randomly selected attributes before applying LOF on those datasets. However,
Differently from the previous studies, our approach consists of two consecutive phases: Bagging and Voting. In the first phase, differently chosen subsets of attributes are used to benefit from various correlations of input features. In the second stage, the effects of diverse
Let
Local Outlier Factor (LOF) method is aimed at finding extreme data points with the help of local deviation with respect to
Given a dataset
for at least
for at most
Given a data point
For a positive number
Given a dataset
The LOF is a score assigned to each data point, which gives information about how likely the data point to be an outlier.
In this paper, we propose a novel twolevel outlier detection approach, named Bagged and Voted Local Outlier Detection (BVLOF). As shown in Figure 2, the BVLOF approach has two main phases which are bagging and voting. Bagging, which is the short form of bootstrap aggregation, finds the outlier scores in a series of
The BVLOF approach provides a solution to improve the performance of the LOF algorithm. The intuition behind this approach is that the combination of diverse feature subsets and multiple variants of the same algorithm can supplement each other and collective decision provides an improvement on the overall detection rate. The integration of partial features provides many advantages over the full features that can be summarized as follows:
Improving accuracy: Integration of partial features increases the possibility of selecting the right subset to improve the accuracy of the model. Since initially it is unknown that which subset of features is more adequate for outlier detection, creating different subsets with different sizes from full features and combining their decisions would be an effective strategy. In the previous studies (Aggarwal, 2017; Lazarevic & Kumar, 2005), it has also been proven that the performance of outlier detection can be significantly improved by using multiple feature subsets.
Dealing with noisy data: Feature subset selection places a significant role in improving the performance of outlier detection, especially in the case of noisy data. When analyzing full features, the data becomes sparse, and the actual outliers become masked by the noise effects of high dimensions (Aggarwal, 2017). On the other hand, some deviations in a small feature subset may be significant enough to be indicative of abnormal behavior.
Increasing robustness: In distance computation, the presence of irrelevant features can significantly degrade the performance of outlier detection (Lazarevic & Kumar, 2005). This is because all dimensions are weighted equally, even the irrelevant and unimportant dimensions. Since the outliers are often embedded in locally relevant subspaces, a set of feature subspaces can be sampled in order to improve robustness (Leng & Huang, 2011).
Taking the advantages of ensemble learning: The advantages of the ensemblebased combined score of a given data point are very significant in the context of outlier detection (Aggarwal, 2017). The outlier scores obtained from different feature subspaces may be very different; therefore, it is usually difficult to fully trust the score of a single subspace, and so the combination of scores is crucial. In this sense, this approach is similar to that of taking advantage of the power of many weak learners to create a single strong learner in ensemblebased classification problems.
Figure 3 illustrates the feature bagging operation performed in the first phase of the BVLOF approach. Let
In the first step of the BVLOF approach, the LOF algorithm is applied to each subset data multiple times with different sizes of the neighborhood (
Figure 4 shows the majority voting operation of the BVLOF approach. The LOF algorithm produces an output for each different size of the neighborhood (
The pseudocode of the BVLOF approach is presented in Algorithm 1. For each ensemble iteration, the size of the feature subset (
In this section, the proposed approach (BVLOF) is illustrated with an example. Assume that three feature subsets (


Randomly determine subset size 




Generate 

Apply LOF_{(k)} on 
Obtain output vectors 


// find highest total vote 

Obtain single output vector 


// find highest total vote 

Obtain a single output vector 
An example to illustrate bagging and voting steps of BVLOF approach.
The main advantages of the proposed approach (BVLOF) over other outlier detection methods can be summarized as follows:
Detection of meaningful local outliers: It provides detection of local abnormally behaving data points with respect to the density of their neighboring data instances, instead of the global model.
Running without any parameter for the neighborhood: It does not require the selection of
Applicability: Since it considers local densities of instances, it is proper for arbitrarily shaped data with groups of varying sizes, characteristics, and densities. Moreover, it has the ability to learn the underlying structure of data without any prior knowledge, i.e. without any assumption about the distributions of data points. In addition, it is independent of the domain of application, so it can be easily generalized for different problems such as network intrusion detection, fraud discovery in telecommunication, failure detection in manufacturing, email spam filtering, and medical data analysis. Furthermore, the BVLOF algorithm is also suitable for highdimensional data since it applies a feature selection scheme in the first stage of the algorithm.
Providing ensemblebased outlier detection: The capability of outlier detection of the LOF method is improved using twolevel ensemble approached which are bagging and voting.
Parallelization: Although the BVLOF algorithm runs more slowly than LOF since it operates on multiple data subsets and for multiple
Improved capability: The integration of various models generated by using different
Easy implementation: It can be easily implemented since it does require any change in the general structure of the LOF algorithm.
Interpretability: Output of the algorithm, which is in the form of observation score, can be easily interpreted by the user.
Besides the advantages of the BVLOF approach, there are some drawbacks that can be summarized as follows:
Performance depending on the k value range: Since the majority voting treats all LOF implementations with each
Computational cost: The BVLOF method is slower than LOF because it requires multiple implementations for different subsets.
Problem specific outliers: The meaning of an outlier may change according to the context of the data and the model does not provide any prior information about the significance of the outlier with respect to the usage criteria.
In this section, the twolevel proposed model is analyzed with respect to time complexity. The time complexity of the BVLOF approach mainly depends on the time complexity of the base algorithm (LOF) and the ensemble size (in other words the number of iterations). More specifically, it depends on following parameters: the number of instances (
The time complexity of LOF is nearquadratic with respect to the number of instances. However, different variations of LOF were proposed in the literature to reduce the time complexity of LOF such as IFLOF (Cheng, Zou, & Dong, 2019), N2DLOF (Su et al., 2017), Topn LOF (TOLF) (Yan, Cao, & Rundensteiner, 2017), BLOF (Bounded LOF) (Tang & Ngan, 2016), SimLOF (Sun et al., 2015), GridLOF (Wang & Huang, 2007), iLOF (Incremental LOF) (Pokrajac, Lazarevic, & Latecki, 2007), and MiLOF (memory efficient incremental local outlier) (Salehi et al., 2016). Therefore, instead of the traditional LOF algorithm, one of its variations can be used in the BVLOF approach. The most computationally expensive step in BVLOF is the k nearest neighbor search. To reduce the time complexity of BVLOF, it is also possible to use various data structures such as R*tree, KDtree, Cover tree, and Mtree thanks to efficient nearest neighbor search.
Even though the running time of the BVLOF method seems higher than the LOF algorithm due to the twolevel ensemble operations such as divisions and integrations, this duration can be decreased by parallelization. If each subset is run simultaneously using different threads or processors, the negative effect of the proposed approach can be eliminated, and almost only LOF implementation and voting operations determine the length of the process.
In this study, the BVLOF method was implemented by using ScikitLearn opensource library with Python programming language. We preferred ScikitLearn since it provides many advantages such as ease of use, consistent and reliable API’s, a wide range of alternative algorithms, automatic hyperparameter tuning, integration of parallelization, and good documentation/tutorials. The LOF algorithm was selected to be used as a base outlier detection method. In all experiments, the default parameters of the algorithms in the ScikitLearn library were used; expect
In order to validate the LOF and BVLOF results, we used an external validation technique. External evaluation measure, which has been extensively studied for clustering and outlier detection tasks to examine the results, uses class labels known as ground truth. Since our study concentrates on datasets with class labels, it uses an external validity measure. Only the data, without classlabels, is used as input to the algorithm; however, the measure evaluates the goodness of the results with respect to class labels tagged as inlier and outlier. The measure evaluates the extent to which the structure discovered by an outlier detection algorithm matches by the externally given class labels. Since the external evaluation measure utilizes the true class labels in the comparison, it is a robust indicator of the true error rate of an outlier detection algorithm. By this way, external measure performs well in predicting the outlier detection errors.
When the data is imbalanced, using overall accuracy measure does not give sufficient information about the performance of the methods. Even though precision and recall scores can also be used, one of the most widely used validation measures in the research studies is the AUC (Area under the Receiver Operating Characteristics Curve) score. Thus, in the experimental studies, we used AUC validation measure to compare the performances of LOF and BVLOF. The ROC curve is drawn by putting True Positive Rate (TPR) on the
The proposed method BVLOF has been tested with 10 outlier detection datasets ranging from 129 to 286,048 instances and from 6 to 41 attributes. The datasets are obtained from different domains. They are publicly available on wellknown data repositories such as UCI Machine Learning Repository and Kaggle. They are frequently used for validating outlier (anomaly) detection algorithms in the literature (Reif, Goldstein, & Stahl, 2008; Yao et al., 2018; Zhou et al., 2013). Actually, these datasets are multiclass datasets having class labels. Here, the minority class (or classes) is considered as outliers compared to other larger classes. Hence, the instances in the smallest class(es) are marked as outliers, while all other instances are inliers. The overview of the datasets is given in Table 2. Generally, the density of the extreme samples is less than 10% of the entire dataset as is expected. The CoverType dataset is much larger than the rest with 286,048 records.
Basic characteristics of the datasets.
ID  Dataset  # Instance  # Feature  % Outliers 

1  CoverType  286,048  10  0.9 
2  Glass  214  9  4.2 
3  KDDCup  60,632  41  0.4 
4  Lymphography  148  18  4.1 
5  PageBlocks  5,473  10  10.2 
6  PenDigits  9,868  16  0.2 
7  Shuttle  1013  9  1.2 
8  Stamps  340  9  9.1 
9  Thyroid  3,772  6  2.5 
10  Wine  129  13  7.7 
In the experimental studies, the LOF algorithm was applied 100 times with different
Figure 5 presents the comparison of the performance of LOF and BVLOF methods in terms of maximum AUC values obtained in all trials. According to the results, the BVLOF approach wins against LOF on 7 datasets of 10 ones. Thus, it is clearly seen that BVLOF generally outperforms LOF on the datasets. For instance, BVLOF produced a notable increment (~4%) in AUC for the Shuttle dataset. BVLOF achieved the best performance with 83.92% AUC for the PageBlocks dataset. However, it was not able to get better results for Glass, Lymphography, and Wine datasets. In those datasets, different
Figure 6 shows the comparative results of LOF and BVLOF in terms of average AUC scores. Apparently, the BVLOF approach has proved to perform better all datasets, except the Wine data. Thus, compared to LOF, the win ratio for BVLOF is 9/10 (90%), hence BVLOF significantly more accurately detects outliers than conventional LOF on average. For instance, the outlier detection rates of BVLOF and LOF for the Thyroid dataset are 82.68% and 90.93% respectively. When the PageBlocks dataset is used, the difference in outlier detection accuracy between the BVLOF and LOF algorithms remains high, 87.96% versus 81.71%. Improvements also exist for other datasets, expect the Wine data. These improvements indicate that a twolevel ensemble approach can become more useful for the outlier detection task.
When the average results of all datasets given in Table 3 are examined, it is clearly seen that BVLOF outperforms LOF significantly. BVLOF (83.97%) is remarkably more accurate than LOF (80.47%) on average. Similarly, in terms of maximum performances, BVLOF (90%) shows a small improvement over LOF (88.42%).
The average of experimental results of all datasets.
LOF (%)  BVLOF (%)  

Maximum Outlier Detection Performance  88.42  90 
Average Outlier Detection Performance  80.47  83.97 
In order to reveal the influences of parameters on the algorithms, BVLOF and LOF methods are compared in the experiment, where both the parameter
It is also possible to say from these results shown in Figure 7 that the point in which the AUC measure reaches to maximum varies according to the dataset used. Namely, both LOF and BVLOF methods are clearly sensitive to the value of the input parameters. This indicates that the parameter
In the tests with different
Another interesting fact is that for each
Table 4 shows the
The points to which AUC value reaches the maximum for each dataset.
Datasets  LOF Best  BVLOF Best 

Cover  60  8 
Glass  12  1 
KDDCup  56  16 
Lymphography  54  85 
PageBlocks  19  4 
PenDigits  88  3 
Shuttle  7  4 
Stamps  3  2 
Thyroid  63  4 
Wine  13  99 
Average  37.5  22.6 
In the BVLOF approach, the outliers are searched in lowerdimensional subspaces thanks to the feature subset selection, and so this solution attempts to reduce variance by guiding diversity. The integration of partial features is often better than the full features since the combination function at the end recognizes the differential behavior of different subspace samples for a given data point. The multiple feature subspace selection places an important role in improving the performance of an outlier detector and can help to deal with noisy data, to increase robustness and to take the advantages of ensemble learning. Data processing on a lowerdimensional space naturally influences the variancebias tradeoff of the base algorithm (LOF), and so the ensemble framework benefits from the existing variability in each detector. Being based on a twolevel ensemble approach to promote diversity, BVLOF offers an improvement on outlier detection rate, while requiring running time higher than singlelevel ensemble approaches.
In this study, a novel twolevel ensemble outlier detection mechanism is proposed. The method consists of two consecutive stages and, at each stage, a different ensemble approach is applied, firstly bagging and then voting. Therefore, it has been called as Bagged and Voted Local Outlier Factor (BVLOF). In the first stage, the dataset is split into subsets without interfering with instance size in terms of different selection of features. As feature bagging has been proved to produce more robust results by the previous studies, this methodology is used in the first stage. However, it is noticed that the choice of input parameter
As future work, different kinds of base outlier detectors from different categories (clusteringbased, distributionbased, etc.) can be applied in the proposed method. In addition, majority voting could be converted into a weighted form by checking the number of outliers returned from each


Randomly determine subset size 




Generate 

Apply LOF_{(k)} on 
Obtain output vectors 


// find highest total vote 

Obtain single output vector 


// find highest total vote 

Obtain a single output vector 
Basic characteristics of the datasets.
ID  Dataset  # Instance  # Feature  % Outliers 

1  CoverType  286,048  10  0.9 
2  Glass  214  9  4.2 
3  KDDCup  60,632  41  0.4 
4  Lymphography  148  18  4.1 
5  PageBlocks  5,473  10  10.2 
6  PenDigits  9,868  16  0.2 
7  Shuttle  1013  9  1.2 
8  Stamps  340  9  9.1 
9  Thyroid  3,772  6  2.5 
10  Wine  129  13  7.7 
The points to which AUC value reaches the maximum for each dataset.
Datasets  LOF Best  BVLOF Best 

Cover  60  8 
Glass  12  1 
KDDCup  56  16 
Lymphography  54  85 
PageBlocks  19  4 
PenDigits  88  3 
Shuttle  7  4 
Stamps  3  2 
Thyroid  63  4 
Wine  13  99 
Average  37.5  22.6 
The average of experimental results of all datasets.
LOF (%)  BVLOF (%)  

Maximum Outlier Detection Performance  88.42  90 
Average Outlier Detection Performance  80.47  83.97 