With the rapid growth of the Internet, massive data are generated daily. With evolving hacking tools and methods, robust security measures are crucial. Businesses have suffered significant financial losses due to security attacks and network intrusions [1, 2]. To address these challenges, network anomaly detection (NAD) and intrusion detection systems (IDSs) safeguard network systems from unauthorized access and infiltration activities. An IDS is a device that monitors and detects attacks or intrusions in a computer or network [3]. It provides alerts to administrators for corrective actions. Traditional detection systems were manual and could not keep up with the increasing data volume and speed. Misuse detection techniques were introduced but were limited to known attacks. Anomaly detection (AD) techniques have replaced traditional signature-based schemes, working with unlabeled data to detect unknown attacks. This eliminates the need for costly labeled traffic generation and allows for identifying new attack versions [4].
AD techniques capture deviations from normal behavior, detecting unknown attacks without prior knowledge [5]. However, they can have a high false positive rate (FPR) due to overfitting. False positives in anomaly detection occur when normal situations are mistakenly classified as abnormal due to unrecognized patterns. Data mining techniques, particularly unsupervised learning (UL), overcome these limitations by using unlabeled data and detecting previously unseen attacks [3]. UL algorithms used in NAD improve the quality of data representations by removing newly derived variables before the categorization process. This means these algorithms eliminate unnecessary or redundant variables, resulting in more informative and accurate representations of the original data. While unsupervised NAD techniques have received less attention, their combination with supervised learning has shown promising results in research. Unsupervised and hybrid techniques have recently gained popularity for effective AD systems.
This paper emphasizes UL's significance in enhancing the security of NAD systems. Notably, UL offers several benefits, including:
UL is analogous to how a human learns to contemplate through their knowledge, bringing it closer to evident AI. For obtaining meaningful information from data, UL is effective. Data that are not classified or categorized are used in UL, which makes it more significant to deploy in real environments. Input data are not always available in a structured format that conforms to output data in the real world. So, UL is needed to derive patterns from unstructured data.
AD schemes have gained increased attention due to the rise of zero-day attacks [5]. Researchers have extensively explored supervised and unsupervised approaches to effectively identify intrusions and anomalies since the early days of networking and the Internet [6]. Classification and clustering methods, such as support vector machine (SVM), K-nearest neighbor (KNN), and k-means, are commonly employed for modeling normal network behavior. Clustering and outlier detection play a significant role in UL schemes for NAD [5, 6, 7, 8]. Promising techniques for high-dimensional datasets include density and grid-based clustering [9, 10]. Hybrid approaches like k-means with decision trees [11] and genetic algorithms (GAs) [12] are effective in detecting network attacks. Self-organizing maps (SOMs) aid in data visualization and dimensionality reduction [13]. Recent developments in artificial neural networks (ANNs), particularly deep learning (DL) architectures, have sparked a revolution [14, 15, 16].
Therefore, this paper offers an in-depth analysis of NAD using unsupervised approaches, highlighting recent developments while identifying key research limitations. The evaluation of 13 UL methods across benchmark intrusion detection datasets aims to assess their effectiveness and provide valuable insights for future research and development in NAD systems. Figure 1 depicts the results of a bibliometric analysis conducted on AD literature. Subfigure (a) illustrates the publication trend of bibliometric papers from 2010 to 2023. Subfigure (b) highlights the data-retrieval process, focusing on publications from reputable publishers such as Springer, IEEE, and Elsevier, which collectively provided most of the analyzed data.
This section reviews and evaluates the performance of UL methods for detecting network anomalies and intrusions. It highlights significant contributions from selected survey papers published between 2000 and 2023. The surveys are categorized into three time intervals for evaluation.
Various studies have explored unsupervised AD algorithms for IDS. In 2000, one study compared cluster-based estimation, KNN, and one-class SVM using the KDD Cup 99 dataset [17]. In another analysis in 2003, the authors evaluated AD approaches on the DARPA98 dataset, showing promising results with local outlier factor (LOF) and unsupervised SVM [18]. In 2007, the authors reviewed different clustering schemes and analyzed that K-harmonic means performed well in speed, while population-based stochastic techniques were suitable for cluster quality [19]. Additionally, in a study in 2007, seven distinct IDS techniques for HTTP anomalies were tested and compared under simulated and real-world settings. It was found that the token-based approaches exhibited better discrimination and generalization using heuristics-based approaches [20].
In 2009, the authors presented literature studies to review machine learning (ML) methods for IDS, including single, hybrid, and ensemble classifiers [21]. In 2012, the authors reviewed and compared different clustering techniques based on parameters such as data type, cluster type, complexity, dataset, and measures, along with their advantages and disadvantages [22]. In another study in 2014, the authors present an overview and comparison of four novelty detection techniques. The experimental outcomes indicate that the KNN novelty detection scheme performs better than other methods [23]. Moreover, in 2015, the authors provided a comprehensive review of the 19 most popular clustering algorithms with a detailed analysis of traditional and modern approaches to emphasize the importance of the essential data-analysis technique, clustering [24].
In 2016, a comprehensive assessment of 19 unsupervised AD techniques was conducted on 10 datasets, showcasing their practical significance across diverse applications [1]. Another study in 2016 compared four AD algorithms, highlighting the superior performance of one-class SVM in smart city wireless sensor networks [2]. In 2018, a survey focused on comparing feature techniques and bridging the gap between IDS and forensic investigation [3]. A 2019 study evaluated unsupervised AD algorithms against unified attack models and addressed research questions related to performance. A detailed review in 2019 explored UL methods in various networking domains, discussing recent advances, challenges, and future directions [4]. A general introduction to the ML algorithms applied to IDS can be found in (2020). The proposed work also performs experiments on the KDD-CUP dataset to evaluate the performance of ML algorithms [5]. In 2021, the authors presented a comprehensive survey on IDS with special attention to feature-selection methods, different types of datasets, challenges, and possible future directions [6].
The survey conducted in 2021 categorizes anomaly detection algorithms into seven distinct categories based on their working mechanisms [7]. These categories include statistical models, density-based methods, distance-based techniques, clustering approaches, isolation methods, ensemble methods, and subspace techniques. For each category, the survey provides insights into the time complexity of the algorithms and outlines their general advantages and disadvantages. Furthermore, a detailed and systematic survey of clustering methods is provided in (2022). This paper covers an in-depth overview of state-of-the-art clustering methods, performance, open issues, and various application areas of clustering methods [8]. A comprehensive overview of UL techniques for AD was conducted in 2022, emphasizing their utility in scenarios with scarce labeled data [9]. It highlights the challenges associated with traditional outlier detection methods and underscores the potential of unsupervised approaches in handling raw data effectively. Another survey was conducted in 2022 to delve into unsupervised anomaly localization (AL) using DL for industrial image inspection [10]. With its in-depth technical insights, the survey emerges as a vital resource for researchers venturing into industrial AL and seeking to extend its applications across diverse domains.
In a 2023 survey, outlier detection strategies from seven families were analyzed, revealing that consensus-based strategies derived from DL outperformed standalone UL methods [11]. Another comprehensive study in 2023 explored anomaly detection techniques within sensor networks and the internet of things (IoT) [12]. It unveiled a significant focus on detecting malicious activities, spanning various ML and DL approaches, with a notable surge in interest between 2017 and 2022. The study showcased a significant adoption of DL and ML techniques, notably MLPs and graph-based methods, across various anomaly detection applications, reflecting the evolving landscape of IoT security measures and the need to address emerging threats comprehensively. A review conducted in 2023 focused on different clustering categories within semisupervised and UL approaches [13]. Furthermore, another survey conducted in 2023 delved into automatic log file analysis for early incident detection, particularly focusing on self-learning anomaly detection techniques [14]. These methods, predominantly leveraging DL neural networks, demonstrated superior performance over traditional approaches and addressed issues with unstable data formats. A comparison of existing anomaly detection surveys with our study is presented in Table 1.
A relative comparison of this paper with existing surveys/review papers.
[15] | Focus on literature review and performance evaluation of UL methods. | IDS | ≈ | × | × | × | ✓ | × |
[16] | It provides a comparative analysis of various unsupervised NAD approaches and investigates standard metrics suitable for many connections. | AD in networks | ≈ | ✓ | × | × | ✓ | × |
[17] | Presents a detailed review of different clustering methods and their strengths and weaknesses. | _ | ≈ | ✓ | × | × | × | × |
[18] | Presents a framework for evaluating anomaly detection methods for HTTP. | HTTP AD | ≈ | ≈ | × | × | ✓ | × |
[19] | An overview of ML techniques for intrusion detection between 2000 and 2007 is presented. | IDS | ✓ | × | × | ≈ | × | × |
[20] | It provides an overview and comparison of various clustering techniques, pros, and cons. | AD | ≈ | ≈ | × | × | × | × |
[21] | Performance evaluation of four novelty detection algorithms is carried out. | Novelty detection | ✓ | ✓ | × | × | ✓ | ✓ |
[22] | Presents a detailed review of popular traditional and modern clustering approaches. | _ | ≈ | ≈ | × | × | × | × |
[1] | Nineteen widely employed unsupervised techniques are evaluated to provide an understanding of their performance in various domains. | AD | ≈ | ✓ | × | × | ✓ | × |
[2] | The proposed work focuses on a comparative evaluation of four UL algorithms. | Smart city wireless networks | ≈ | ✓ | × | × | ✓ | × |
[3] | A survey on ML techniques focusing on unsupervised and hybrid IDS is presented. | IDS | ✓ | ✓ | × | ✓ | × | ✓ |
[4] | The proposed work aims to provide an experimental comparison of unsupervised methods employed for NAD against five intrusion detection datasets. | Anomaly-based IDS | ≈ | ✓ | × | ✓ | ✓ | × |
[23] | A comprehensive survey of UL techniques in networking is presented. | Anomaly-based in networking | ≈ | ✓ | ✓ | × | × | ✓ |
[5] | Review and compare ML algorithms for IDS. | IDS | ✓ | ≈ | × | × | ✓ | × |
[6] | Presents a detailed review of state-of-the-art IDS methods, datasets, performance measures, and research challenges. | IDS | ✓ | ≈ | × | ✓ | ✓ | ✓ |
[8] | An in-depth review of state-of-the-art clustering methods is presented. | - | ≈ | ✓ | ✓ | × | × | ✓ |
[9] | A comprehensive overview of UL techniques for AD is presented, emphasizing their utility in scenarios with scarce labeled data. | AD in industrial applications | ✓ | ✓ | ✓ | × | × | ✓ |
[11] | Review and compare outlier detection techniques from seven different families. | Outlier detection | ≈ | ≈ | × | × | ✓ | × |
[12] | AD techniques are explored comprehensively to address the detection of emerging threats. | IoT and sensor data | ✓ | ≈ | ✓ | × | × | ✓ |
[13] | In-depth review of various unsupervised and semisupervised clustering methods. | - | ✓ | ✓ | ≈ | × | ✓ | × |
[14] | Focus on log file analysis for early incident detection, particularly emphasizing self-learning AD techniques. | AD | ✓ | ≈ | ✓ | × | × | × |
Our survey | The primary focus is on studying UL NAD methods while considering recent advances. | NAD | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
DL, deep learning; IDS, intrusion detection systems; IoT, internet of things; ML, machine learning; NAD, network anomaly detection; UL, unsupervised learning.
From Table 1, existing literature offers surveys on ML techniques for NAD and other domains but lacks a comprehensive assessment of UL techniques specifically for NAD systems. This paper fills that gap by comparing UL techniques based on relevant parameters, as mentioned in Table 1. It also includes recent advances in unsupervised NAD and empirical evaluations on benchmark datasets to analyze performance across conventional and modern datasets.
The motivation for this paper stems from the need to address the shortcomings observed in existing research on UL approaches in NAD systems. The paper aims to fill various gaps identified in previous studies by conducting a comprehensive survey. These gaps include the need for more thorough comparative analyses that consider the characteristics of different algorithm classes, the limited generalizability of findings due to analyses on limited datasets, and insufficient emphasis on recent network datasets in evaluations. Moreover, previous reviews require more coverage of recent advances in NAD systems and performance evaluations on benchmark datasets such as NSL-KDD, UNSW-NB15, and CIC-IDS2017. To bridge these gaps, this paper evaluates 13 UL algorithms on popular benchmark datasets, providing a benchmark for anomaly-based IDS. Furthermore, the analysis conducted in this paper highlights the strengths and weaknesses of UL algorithms across varied dataset environments, offering valuable insights for further research and development in the field of NAD systems.
The key contributions of this paper are highlighted as under:
It discusses the state-of-the-art unsupervised AD methods and reviews their utilization for detecting network attacks. It presents contemporary coverage of recent trends and innovations in the respective field. This paper investigates the experimental comparison of 13 selected unsupervised AD methods on three popular benchmark datasets. The analysis of results is also presented. The findings of the experiments provide a basis for evaluating the performance of algorithms across different families of AD techniques about network datasets and for forecasting the efficacy of the chosen algorithm under varying parameter settings. It can be utilized to assist in the selection of a suitable UL algorithm for a given task. Finally, the paper presents future works highlighting potential pitfalls and research gaps in the literature.
The present paper is presented in different sections. Section 2 introduces 13 selected UL-based methods from the pool of 6 classes of AD. This section also presents an unsupervised NAD detection model. Section 3 is the core section that discusses literary works in detail by proposing the classification of UL approaches into distinct classes. It also presents recent works to highlight the latest developments and innovations. Section 4 performs the experimental evaluation of 13 selected UL algorithms on 3 benchmark datasets. In Section 5, research challenges and future directions are detailed. Finally, the paper is concluded in Section 6. The roadmap of the paper is depicted in Figure 2.
The following section discusses the various UL methods that are used for AD. A generic framework of the NAD model employing K-means clustering is also presented. The focus of this section is to provide an in-depth discussion of six different families of AD and their types.
The literature review evaluates several UL algorithms based on their performance in NAD systems. A taxonomy of 13 unsupervised algorithms is proposed from the pool of 6 families of AD techniques, as presented in Figure 3, based on the following criteria:
Based on previous practical applications, primarily employed in intrusion detection/NAD. The collection of algorithms should include the following classes of AD techniques: clustering, dimensionality reduction, statistical, neighbor-based, density-based, and classification-based to develop an effective prediction model. Unsupervised dimensionality reduction algorithms are also included in 13 selected methods to facilitate non-redundant and relevant features in the dataset and support the effective deployment of UL methods in real-world data with higher dimensions. The categorization of 13 UL methods selected from 6 families of AD is presented in Figure 3.
The proposed taxonomy of UL methods from the six families of AD techniques is illustrated in Figure 3. This taxonomy aids in developing AD models for detecting network anomalies and intrusions. The taxonomy of AD techniques is discussed below and empirically evaluated in Section 5, providing insights into their performance across diverse network datasets.
This section presents the NAD model based on UL, which mainly employs K-means clustering for grouping data instances.
Figure 4 displays the framework of the NAD model that utilizes UL. The framework provides a detailed methodology for the NAD model and consists of several phases, each serving a specific purpose to enhance the effectiveness of NAD systems. These phases include preprocessing, feature selection, grouping based on K-means clustering, and detection using a Random Forest classifier. The preprocessing phase ensures data integrity by converting categorical variables into numerical format for improved processing. The feature selection phase employs correlation-based feature selection (CFS) to optimize dataset dimensions while preserving the intrinsic meaning of features. The grouping phase involves K-means clustering, which uses similarity measures to create a comprehensive normal behavioral profile. UL based on clustering enhances exploratory data analysis, exposing underlying structures. In the detection phase, the framework employs a Random Forest classifier, a powerful decision-making tool, to accurately identify data instances. The framework's contribution extends beyond precise categorization; it reveals meaningful patterns within cohesive subgroups, adding depth to understanding network intrusions. A brief description of the unsupervised model is discussed below.
The various stages are:
In this section, various literature works employed for network intrusion detection using UL are discussed in detail. The preliminary studies categorize various UL approaches into six classes, providing a comprehensive framework for organizing and analyzing this field's diverse methods.
This study aims to classify the existing literature on UL approaches into different categories to improve understanding and facilitate effective comparisons. The identified classes include clustering, SOM, dimensionality reduction, classification-based techniques, hybrid methods, and DL, as depicted in Figure 5. By organizing the diverse range of approaches into these distinct classes, this classification framework facilitates meaningful comparisons and analysis, aiding researchers and practitioners in selecting the most suitable approaches for their specific tasks and advancing the field of UL.
An unsupervised approach based on density and grid-based clustering was proposed in 2005 [46]. It effectively classified a high-dimensional dataset into a collection of clusters in which the points not associated with any cluster are flagged as abnormal. The presented strategy achieved comparable outcomes; still, the FPR was high [46]. Another advanced method based on fuzzy rough C-means clustering was introduced in further research [47, 48]. K-means clustering is a well-known methodology for finding anomalies, outperforming prior unsupervised algorithms in accuracy [49]. Later 2012, a new variant was introduced that merged K-means clustering with the C4.5 decision tree approaches to generate more effective outcomes than previous methods [50]. Density-based clustering methods excel at identifying clusters of arbitrary shapes, while grid-based clustering schemes offer fast processing.
Leung and Leckie [46] proposed the merging of adaptive intervals approach to spatial data (MAFIA). This novel mechanism combines density-based and grid-based clustering to discover high-dimensional input space clusters. Gujral et al. [51] employed spectral clustering to identify scattered anomalies and form representative clusters. A sparse symmetric matrix was utilized with a KNN to address scalability in IDS. In another work [52], a two-fold framework for NAD in high-dimensional data was proposed by Ni et al. [52] The framework uses a clustering technique to approximate cluster centers and the mutual information coefficient to remove redundant and irrelevant features. Comparing experimental results to those of other unsupervised approaches, accuracy was enhanced. To represent normal data into clusters, Bhuyan et al. [53] presented an effective outlier-based technique employing a variation of k-means clustering. Based on outlier scores that exceed a threshold, anomalies are found.
Syarif et al. [54] compared misuse detection and AD algorithms using four classifiers (Naïve Bayes, rule induction, NN, and decision tree) for misuse detection and five clustering algorithms (k-means, k-medoids, EM clustering, and distance-based outlier detection) for AD. Evaluation of DARPA98 and NSL-KDD datasets showed that detecting unknown attacks remains challenging for misuse detection. Rule induction achieved 99.58% accuracy with a 0.40% FPR for known attacks, while the decision tree classifier performed poorly with 63.97% accuracy and a 17.90% FPR. In contrast, the distance-based outlier detection algorithm achieved 80.15% accuracy in detecting unknown intrusions in the AD module. Bhuyan et al. [55] proposed Tree-based Subspace Clustering (TreeCLUS), a UL approach for detecting network anomalies with a minimized FPR. It utilizes cluster stability analysis to identify effective clusters and a multi-objective scheme for cluster labeling.
In another work, Prasad et al. [56] proposed an unsupervised feature selection technique for IDS. This technique selects the most informative features by calculating feature scores using entropy and computing the feature histogram. The study also introduced a novel cluster center initialization approach to address the limitations of conventional clustering techniques. The approach computed semi-identical instances, created sets, and avoided outliers as initial cluster centers. Micro-clusters were formed based on spatial distance, and similar micro-clusters were merged into macro-clusters. The presented cluster centralization approach outperformed basic clustering methods, requiring fewer iterations to obtain final clusters and improving the accuracy of the detection model.
UNADA [57] is a distributed computing framework-based scheme designed to enhance the detection of NAD unsupervised algorithms. It leverages a density-based grid algorithm deployed on a large cluster of servers to process high traffic volumes efficiently. However, DBSCAN's limitations in handling high-dimensional data are addressed by introducing subspace clustering and evidence accumulation techniques for parallel processing across multiple servers. Despite its advantages, UNADA's time complexity hampers its suitability for real-time and scalable anomaly detection. In another work [58], a real-time UL anomaly detection scheme for streaming data is proposed. Hierarchical temporary memory (HTM) detects spatial and temporal anomalies in noisy environments. The strategy adapts well to changing data streams while minimizing false positives. However, there is potential for further accuracy improvement by adopting ensemble-based approaches. Additionally, the exploration of real-time anomaly detection techniques for multivariate data streams is crucial to address the challenges of the current scenario.
SOMs are frequently applied as unsupervised tools for identifying malicious and abnormal network behavior. The characteristic of SOMs is that they can automatically organize and infer patterns from a range of inputs and then assess whether a piece of new information conforms to the deduced pattern or not, hence identifying anomalous inputs [6, 59, 60]. Landress [61] proposed a fusion model for UL in intrusion detection to address false positives. The model combines K-means clustering, J48 decision tree, and SOM using the KDDCUP99 dataset. Clustering is performed with k-means, grouping attacks into normal, Neptune, smurf, and ipsweep categories. J48 decision tree is then used for feature selection, reducing false positives by selecting optimal features. SOM is applied to visualize the data and categorize abnormal behavior.
Huang and Huang [62] employed GHSOM to analyze anomalous traffic behavior by analyzing node density and samples. The network traffic data are analyzed via visual representation of attack patterns concerning hierarchical relationships. In the detection phase, GHSOM is combined with SVM, which provides classification rules to identify the network traffic into normal and abnormal behaviors.
PCA is commonly used for dimensionality reduction in IDS [63, 64], but ICA is proposed as an alternative for non-Gaussian data [65]. Pattewar and Sonawane [66] present a feature extraction approach based on PCA and KPCA. In the first approach, PCA is utilized for extracting features. KPCA and Bayesian methods are employed in the second approach to eliminate the noisy features, thereby removing the critical features that result in better accuracy and computational efficiency than that of PCA.
In another work proposed by Kuang et al. [67], hybridization of the KPCA, GA, and SVM model is employed. In the proposed scheme, KPCA is applied to elicit the core features and the NRBF kernel is employed to lessen the training time of SVM. At the same time, GA optimizes the parameters of SVM to avoid overfitting. The experimental findings reveal that the proposed approach shows better generalization performance and a higher accuracy rate than the SVM classifier. In the work of Elkhadir et al. [68], a comparative analysis of PCA and KPCA is performed on the KDDCUP99 dataset. Experiments reveal that when the number of NN is between one and four, a KPCA with a quadratic kernel offers a higher detection rate (DR) than a PCA, notably for identifying DOS and PROBE attacks.
Unlike attack records, normal connection records are often readily available in real-world applications. As a result, one-class SVM trained on a dataset consisting of normal connection records can effectively differentiate between normal and varied attacks. Amer et al. [69] report that one-class SVM is susceptible to data outliers and suggest two modifications: robust one-class SVM and ETA one-class SVM. The central concept for robust one-class SVM is to minimize the mean square error (MSE) for dealing with outliers by utilizing the class mean as the averaged information. In contrast, ETA one-class SVM employs an outlier suppression mechanism. The proposed approach indicated that the enhanced one-class SVM performed better on two of four datasets than the existing UL AD methods.
Nguyen et al. [70] further improve the performance of one-class SVM by introducing the nested one-class SVM. Radial basis function (RBF) or a Gaussian kernel is commonly used to kernelize one-class SVM. However, selecting a Gaussian kernel hyperparameter based on the attack's availability is impractical. The authors attempted to solve the problem of kernel parameter optimization by proposing data-driven hyperparameter optimization. The presented methodology has achieved high detection accuracy and a lower false alarm rate (FAR) than the original one-class SVM in the experiments.
Verkerken [71] conducted a study on the generalization ability of UL algorithms in NAD. The four UL methods used for evaluation were PCA, IF, autoencoder (AE), and one-class SVM. The models were trained on one dataset and evaluated on another to assess their inter-dataset generalization. Experimental results showed that one-class SVM achieved the best AUROC scores, followed by the AE and PCA. However, when evaluated on different datasets, there was a significant drop in performance for all models, indicating a lack of generalization. The inter-dataset evaluation scheme highlighted the low generalization strength of the models. The study suggests that further research is needed to improve the adaptability and generalization of UL models for real-time detection in practical applications.
Zoppi et al. [26] conducted an extensive study investigating the relationship between attack families, anomaly classes, and detection methods. Their experiments on various UL algorithms, including ODIN, K-means, LOF, FastABOD, HBOS, and one-class SVM showed that FastABOD is effective for point anomalies, K-means performs well for collective anomalies, and HBOS is lightweight and efficient for limited computing resources. Additionally, the analysis of HBOS in Ref. [72] found that dynamic bins are particularly useful for identifying R2L and U2R attacks. NN-based approaches are commonly used and exhibit strong performance due to their unsupervised nature and outlier detection criteria. However, they may face challenges with high-dimensional data and incur computational complexity when dealing with large datasets. On the contrary, SVM maximizes the machine's accuracy concerning the training dataset while maximizing its capacity to correctly classify subsequent testing datasets, even on high-dimensional datasets. Therefore, combining NN-based approaches and SVM enhances classification performance with reasonable computational cost in high-dimensional data [73]. In contrast, SVMs maximize accuracy on training and testing data, including high-dimensional datasets, with one-class SVMs focusing on discovering decision boundaries with maximum distance from the origin [73].
Falcão et al. [4] compared various UL algorithms for NAD. Classification algorithms were found to exhibit the highest accuracy among the methods considered. Angle-based algorithms showed robustness in parameter selection. However, detecting anomalies with nonrepeatable and unstable behavior remained challenging, and higher detection scores were observed in datasets with rare anomaly events. In the literature, many proposed NAD schemes rely on traditional distance metrics such as Euclidean or Mahalanobis distance, which may not adequately capture the heterogeneous nature of network features.
To overcome this limitation, Aliakbarisani et al. [74] proposed a linear metric learning approach incorporating side information and UL techniques to extract similarity or dissimilarity constraints. Their approach formulates an optimization problem using the constraint trace ratio and utilizes Laplacian Eigenmap for preserving local neighborhood information. Experimental results on Kyoto 2006+ and NSL-KDD datasets demonstrated improved accuracy and reduced FPRs compared to Euclidean-based distance metrics.
Preprocessing steps are essential in NAD. One common technique is using LOF to remove outliers from the training set. LOF detects network anomalies by comparing them with new data and estimating the LOF value based on preprocessed normal datasets [75]. Additionally, Ding et al. [76] proposed hybridizing the similarity matrix scheme and LOF as preprocessing steps to identify outliers by considering data objects with lower similarity than others. This approach enhances outlier detection adaptability, especially for datasets with irregular shapes.
With the rapid growth of the IoT, attackers are exploiting vulnerabilities in insecure devices like IP cameras to launch devastating Distributed Denial of Service (DdoS) attacks [77]. Traditional defense methods that rely on predefined features extracted from predefined flows are often ineffective in promptly detecting and blocking malicious flows. Hwang et al. [83] proposed a new scheme for automatically profiling traffic patterns directly from raw traffic to address this. By examining the initial packets and utilizing a convolutional neural network (CNN), the method generates features from raw traffic, enabling auto-learning. An unsupervised AE is then used to create profiles of benign traffic, effectively filtering out abnormal traffic. Experimental results showed a promising outcome with 100% accuracy and low false negative (0.02%) and false positive (0.83%) rates achieved by the proposed method.
Kabir and Luo [78] evaluated the performance of traditional and DL-based unsupervised methods for network flow anomaly detection. The study compared SOM, K-means, deep autoencoding GMM (DAGMM), and adversarially learned anomaly detection ALAD algorithms. Experimental results on the KDDCUP99 dataset showed that DAGMM achieved the lowest FPR (0.9%), while SOM had the highest accuracy (99.9%). SOM and k-means were found to be easy to implement with stable performance. ALAD performed well in detecting minority attacks due to generating adversarial samples.
In another study, Truong-Huu et al. [79] investigated the deployment of generative adversarial networks (GANs) for NAD. They utilized the UL scheme, which does not require prior label information during training. The concepts of traffic sessions and traffic aggregation techniques were introduced to extract useful traffic features and learn traffic behaviors. The experiments conducted on UNSW-NB15, CIC-IDS2017, and synthetic datasets demonstrated that GAN outperformed existing anomaly detection approaches.
Sovilj et al. [80] addressed the lack of descriptive analysis in intrusion detection models by introducing an attentional model analysis for DL models. This analysis provides explanations and predictions, aiding network operators in understanding potential threats. The authors conducted a comparative analysis of deep neural learning architectures for IDS in sequential data streams. Experimental results on synthetic and CIC-IDS2017 datasets showed that the RNN variant outperformed the nonsequential deep AE for anomaly detection. However, the performance of the attentional model could be further improved by incorporating attentional components to detect anomalous input data segments. Another study by Carrera et al. [81] addresses the critical need for detecting zero-day cyber-attacks, which exploit undisclosed vulnerabilities, posing substantial technical and economic risks to enterprises and technology-dependent systems. Through benchmarking three different approaches, such as deep autoencoding with GMM and IF, the proposed work demonstrates enhanced accuracy with IF. Furthermore, employing the SHAP methodology elucidates crucial features for attack classification, as validated through experiments conducted on diverse datasets, including KDD99, NSLKDD, and CIC-IDS2017. Sáez-de-Cámara et al. [82] address IoT security challenges by proposing a federated learning (FL) architecture for unsupervised network intrusion detection. Leveraging FL mitigates network overhead and data isolation issues, enhanced by an unsupervised device clustering algorithm. Evaluation on a diverse testbed showcases successful device grouping, although clustering quality may diminish with insufficient local training data. Notably, FL-based models exhibit faster convergence and low FPRs, outperforming ML-based alternatives like Kitsune, signaling promise for real-world deployment. However, challenges remain in adapting the approach to high-mobility IoT settings and addressing compromised devices in adversarial scenarios, warranting further exploration. The comparison analysis of UL approaches employed for network anomaly/intrusion detection is summarized in Table 2, providing an overview of the findings and insights regarding the performance and effectiveness of different methods.
Relative comparison of UL approaches for network anomaly/intrusion detection.
[83] | Unsupervised approach for outlier detection using subspace clustering and evidence accumulation. | DBSCAN | METROSEC | Continuous | 95 | 95 | 85 | 85 | ✓ |
[84] | Tree-based subspace clustering approach for high-dimensional datasets; includes cluster stability analysis. | Tree-based subspace clustering (TCLUS) | KDDCUP99, TUIDS | Mixed | 99 | 96 | 86 | 66 | |
[85] | Multiclustering scheme incorporating subspace clustering and evidence accumulation to overcome knowledge-based approach limitations. | DBSCAN | KDDCUP99, METROSEC | N/A | - | - | - | - | ✓ |
[86] | Particle swarm optimization clustering strategy based on map-reduce methodology for parallelization in large-scale networks. | PSO clustering | KDDCUP99 | Mixed | Max AUC: 96.3 | ||||
[87] | Novel strategy for automatic tuning and optimization of detection model parameters in a real network environment. | Clustering and one-class SVM | KDDCUP99 and Kyoto University | Continuous | - | - | - | - | ✓ |
[88] | K-means clustering to generate training subsets; neuro-fuzzy and radial-basis SVM for classification. | K-means, SVM, neuro-fuzzy neural network | KDDCUP99 | N/A | 98.8 | 97.31 | 97.5 | 97.5 | |
[89] | Innate-immune strategy via UL for categorizing normal and abnormal profiles. | DBSCAN | KDDCUP99 | N/A | FPR: 0.008, TNR: 0.991, ACC: 77.1, Recall: 0.589 | ✓ | |||
[49] | Novel approach using cluster centers and NNs to transform the dataset into a one-dimensional feature set. | Clustering, KNN | KDDCUP99 | N/A | 99.68 | 87.61 | 3.85 | 57.02 | |
[90] | Nature-inspired meta-heuristic approach to optimize the optimum path forest clustering algorithm. | Optimum path forest clustering | ISCX, KDDCUP99, NSL-KDD | N/A | Purity measure: ISCX: 96.3, KDDCUP99: 71.66, NSL-KDD: 99.8 | ||||
[91] | UL technique for real-time detection of fast and complex zero-day attacks. | DBSCAN | DARPA, ISCX | N/A | ACC: 98.39%, Recall: 100%, Precision: 98.12%, FP: 3.61 | ✓ | |||
[92] | Modified optimum path forest algorithm to enhance IDS performance, particularly in detecting less frequent attacks. | K-means, modified optimum path forest | NSL-KDD | N/A | 96.89 | 85.92 | 77.98 | 81.13 | |
[93] | Hierarchical agglomerative clustering applied to SOM network to lower computational cost and sensitivity. | Hierarchical agglomerative clustering, SOM | NSL-KDD | Mixed | DR: 96.66%, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033 |
ACC, accuracy; AUC, area under curve; DBSCAN, density-based spatial clustering of applications with noise; DoS, denial of service; DR, detection rate; FAR, false alarm rate; FNR, false negative rate; FPR, false positive rate; IDS, intrusion detection systems; KNN, K-nearest neighbor; NN, nearest neighbor; SOM, self-organizing map; SVM, support vector machine; TPR, true positive rate; UL, unsupervised learning.
Table 2 offers a comprehensive overview of various network anomaly and intrusion detection strategies. It delineates different methodologies and algorithms, ranging from subspace clustering to tree-based clustering and optimization techniques, all tailored to different datasets. The performance metrics exhibited by these approaches vary significantly, with some achieving commendable accuracy and DRs.
The table delineates these methodologies and offers a nuanced insight into their performance metrics. Some methods demonstrate impressive accuracy and DRs, showcasing their efficacy in identifying anomalous activities within network traffic. However, a notable observation is the need for more emphasis on real-time detection capability across these methodologies. Real-time detection is imperative for promptly identifying and mitigating security threats, yet not all approaches prioritize this aspect adequately.
This section delves into the most recent developments in anomaly-based network intrusion detection systems (NIDS) that play a critical role in protecting networks against ever-evolving cyber threats. Table 3 provides a comprehensive overview of recent research efforts to improve network security. It meticulously documents the contributions of various studies, highlighting innovative methodologies, algorithms, and techniques employed in modern IDSs.
Recent developments in unsupervised-based network intrusion/anomaly detection.
Amoli et al. (2015) [91] | Real-time intrusion detection using adaptive thresholds. | DBSCAN | DARPA, ISCX | DoS, DdoS, POD, SMURF, Mail-bomb, botnet, port scanning-port sweep | Precision: 98.12, ACC: 98.39, FPR: 3.61 | Future work: Detecting complex attacks, distinguishing flash crowds and DdoS for better clarity. |
Zhang et al. (2016) [42] | Utilizing One-class SVM for network intrusion identification. | One-class SVM | NSL-KDD | DoS, Probe, U2R, R2L | Precision: 99.3, Recall: 91.61, F-value: 95.18 | Low DR for minority-class attacks like U2R and R2L. |
Landress (2016) [61] | Employing K-means clustering, J48, decision tree, and SOM to reduce false positives. | K-means, J48, decision tree, self-organizing map | KDDCUP99 | DoS, Probe, U2R, R2L | ACC: 98.92 | Limitation: Increased computational complexity with SOM; explore efficient hybrid techniques for real-time processing. |
He et al. (2017) [94] | Exploiting SDN for effective anomaly detection. | DBSCAN | KDDCUP99 | DoS, Probe, U2R, R2L | ACC: 94.5 | Future work: Focus on real-time packet clustering for timely detection. |
Ariafar and Kiani (2017) [95] | Optimizing clusters using GA, K-means, and decision trees. | K-means, decision tree, GA | NSL-KDD | DoS, Probe, U2R, R2L | DR: 99.1, FAR: 1.8 | Evaluate on modern-day datasets like CIC-IDS 201 and CSE-CICIDS2018. |
Bigdeli et al. (2018) [96] | Incremental cluster updates with spectral and density-based clustering. | Spectral and density-based clustering | KDDCUP99, NSL-KDD, Darpa98, IUSTsip, DataSetMe | DoS, Probe, U2R, R2L | DR: 94%, FAR: 4% | Introduce concept drift while merging clusters. |
Almi'Ani et al. (2018) [97] | Hierarchical agglomerative clustering using k-means for reduced training time. | Hierarchical agglomerative clustering | NSL-KDD | DoS, Probe, U2R, R2L | DR: 96.66, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033 | Low accuracy due to low sensitivity toward normal behavior; investigate fuzzy C-means clustering for accuracy enhancement. |
Zhou et al. (2019) [98] | Hybrid technique using KPCA and ELM. | KPCA, ELM | KDDCUP99 | DoS, Probe, U2R, R2L | DR: DoS: 98.96, Probe: 98.54, R2L: 94.72, U2R: 36.54, Acc: 98.18, FAR: 2.38 | The DR of U2R is quite low. To optimize the results of ELM, evolutionary algorithms need to be applied. |
Choi et al. (2019) [99] | Extracting key features using AE. | AE | NSL-KDD | DoS, Probe, U2R, R2L | ACC: 91.70 | Improve U2R DR; apply evolutionary algorithms to optimize ELM results. |
Aliakbarisani et al. (2019) [74] | Formulates a constraint trace ratio optimization problem for Laplacian Eigenmap strategy. | Constraint trace optimization, PCA, LDA | NSL-KDD, Kyoto 2006+ | DoS, Probe, U2R, R2L | ACC: 97.84, F-score: 0.878, FPR: 0.001 | Train and test with different datasets; explore ensemble strategies based on UL models for enhanced performance. |
Paulauskas and Baskys (2019) [72] | Employing HBOS mechanism for identifying rare attacks like U2R and R2L. | HBOS | NSL-KDD | DoS, Probe, U2R, R2L | F-measure: 87 | Investigate online learning metrics for newly discovered attacks. |
Hwang et al. (2020) [100] | CNN and AE for extracting raw features from network traffic. | CNN, AE | USTC-TFC 2016, Mirai-RGU, Mirai-CCU | SYN flood, UDP flood, ACK flood, and HTTP flood, Mirai C&C | ACC: 99.77, Precision: 99.93, Recall: 99.17, F1 measure: 99.55, FNR: 0.02, FPR: 0.83 | Improve performance rate of majority classes. |
Zavrak and Iskefiyeli (2020) [101] | Deploys flow-based IDS with One-class SVM as anomaly detector. | AE, VAE, One-Class SVM | CIC-IDS2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed | AUC: VAE: 75.96, AE: 73.98, One-Class SVM: 66.36 | Consider flow-based attributes collected at specified time intervals to improve DR and reduce FAR. |
Truong-Huu et al. (2020) [79] | Uses GAN strategy to extract useful features and proposes a traffic aggregation technique. | GAN | UNSW-NB15, CIC-IDS2017 | Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed |
UNSW-NB15: Precision: 0.84, Recall: 0.85, F1-score: 0.85, AUPRC: 0.8831 CIC-IDS2017: Prec: 0.8260, Recall: 0.8268, F1-score: 0.8264, AUROC: 0.9529, AUPRC: 0.8271 |
Investigate multi-class classification approach for identifying different types of attacks. |
Prasad et al. (2020) [56] | Proposes a novel cluster center initialization approach to overcome shortcomings of conventional clustering techniques. | K-means clustering, cluster center initialization | CIC-IDS2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed | DR: 88, FR: 88.5, Precision: 88, F-measure: 0.531, ACC: 88.6 | Address limitations like manual pre-processing and time/ space complexity in MANET deployment. |
Megantara and Ahmad (2021) [102] | Utilizes feature importance and data-reduction techniques to improve the prediction performance of NAD. | Decision tree, LOF | NSL-KDD, UNSW-NB15 | DoS, Probe, U2R, R2L Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, worms |
NSL-KDD: Acc: 99.73 UNSW-NB15: 91.86 |
The size of the LOF cluster can be optimized and the threshold value for handling outliers can be improved. |
Liao et al. (2021) [103] | Presents an ensemble of UL schemes based on a novel weighting scheme. | AE, GANs | UNSW-NB15, CIC-IDS 2017 | Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed |
UNSW-NB15: Precision: 97.9, Recall: 92.4, F1-score: 94.9 CIC-IDS2017: Precision: 83.8, Recall: 84, F1-score: 83.5 |
Address suboptimal results for specific attack families, explore optimization for better performance. |
Verkerken et al. (2022) [71] | Proposes an inter-dataset evaluation approach for ensemble UL algorithms. | PCA, IF, AE, One-class SVM | CIC-IDS 2017, CSE-IDS 2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heartbleed |
ACC: PCA: 90.9, IF: 91.1, AE: 99.9, One-class SVM: 98.9 AUROC: PCA: 93.73, IF: 95.8, AE: 97.75, One-class SVM: 94.20 |
Consider employing supervised learning approaches for generalization strength validation. |
Singh and Jang-Jaccard (2022) [104] | Proposes a unified AE architecture based on CNN and LSTM to examine spatiotemporal correlations in traffic data. | MSCNN, LSTM-based AE | NSL-KDD, UNSW-NB15, CICDDoS2019 | Normal, Attack (Binary class) | ACC: 97.10, Precision: 95.9, Recall: 96.4, F-score: 96.0 (average) | Fine-tune hyperparameters for enhanced model performance using optimization techniques. |
Wang et al. (2022) [105] | Proposes an ensemble of UL algorithms to address challenges in processing overhead and poor detection performance for unseen threats. | AE, IF | CES-CICIDS2018, MQTT-IOT-IDS2020 | Sparta SSH brute force, DoS attacks-SlowHTTPTest, DoS-Hulk, DdoS attack-LOIC-UDP, DdoS attacks-LOIC-HTTP, Brute force-XSS, Brute force-web, FTP-brute force | Average ACC: 96.43, Average Recall: 95.95, Average F-score: 96.02 | Improve DR of specific attacks like “DoS-Hulk” in the proposed work. |
De C. Bertoli et al. (2022) [106] | Presents an NIDS for generalized detection performance in heterogeneous networks. | Deep AE, FL, Energy Flow Classifier | UNSW-NB15, CSE-CICIDS2018, BoT-IoT, ToN-IoT | Normal, Attack (Binary class) | BoT-IoT: ACC: 93, Recall: 93, Precision: 99, F-score: 95 ToN-IoT: ACC: 85, Recall: 85, Precision: 87, F-score: 77, UNSWNB15: ACC: 97, Recall: 99, Precision: 57, F-score: 73 CSE-CICIDS2018: ACC: 98, Recall: 88, Precision: 92, F-score: 90 | Deploy the proposed approach in distributed NIDS for robustness evaluation. |
Eren et al. (2023) [107] | Proposes a novel tensor decomposition method to enhance the detection of unseen attacks in network anomaly and intrusion detection, leveraging tensor factorization for improved generalization and identification of evolving threats. | Unsupervised DL, Tensor factorization algorithm | Neris-botnet, Spam e-mail | - | ROC-AUC, PR-AUC Average ROC-AUC: 0.9661, Average PR-AUC: 0.9152 | Using a tensor decomposition algorithm with latent factors enables enhanced initialization of the tensor factorization algorithm for improved performance. |
Lan et al. (2023) [108] | Presents a novel framework for unsupervised intrusion detection that transfers knowledge from known attacks to detect new ones using a hierarchical attention-based triplet network and unsupervised domain adaptation. | Attention network, unsupervised domain adaption | ISCX-2012, UNSW-NB15, CIC-IDS2017, CTU-13 | HTTP DoS, Brute Force SSH, PortScan, Brute Force, Dos slow loris, Web attack XSS, Neris, Rbot, Fuzzers, Generic, Shellcode, Exploits | Average ACC: 96.38, Average DR: 94.16 | Utilize tensor decomposition algorithm with latent factors for enhanced initialization and performance. |
Boppana and Bagade (2023) [109] | Synergy of GANs and AE for unsupervised intrusion detection in MQTT networks. | AE, GAN | MQTT-IoT-IDS2020 | Normal, Attack (Binary class) | F1-score: 97 | Evaluate generalizability to diverse network architectures in future work. |
ACC, Accuracy; AE, autoencoders; AUC, area under curve; CNN, convolutional neural network; DBSCAN, density-based spatial clustering of applications with noise; DdoS, distributed denial of service; DL, deep learning; DoS, denial of service; DR, detection rate; ELM, extreme learning machine; FAR, false alarm rate; FL, federated learning; FNR, false negative rate; FPR, false positive rate; GA, genetic algorithm; GAN, generative adversarial network; HBOS, histogram-based outlier score; IDS, intrusion detection systems; IF, isolation forest; IoT, internet of things; KPCA, kernel principal component analysis; LOF, local outlier factor; LSTM, long short-term memory; MSCNN, multi-scale convolution neural network; NAD, network anomaly detection; NIDS, network intrusion detection system; PCA, principal component analysis; PR-AUC, precision-recall curve; ROC, receiver operating characteristic; ROC-AUC, ROC curve; SOM, self-organizing map; SVM, support vector machine; TPR, True positive rate; UL, unsupervised learning; VAE, variational autoencoders.
Table 3 provides an overview of recent developments in unsupervised network intrusion/anomaly detection. Various techniques and algorithms, such as DBSCAN, one-class SVM, K-means clustering, AE, GANs, and tensor factorization, among others, have been discussed. The studies have utilized different datasets and evaluated the performance based on validation metrics such as accuracy, precision, recall, and F-score. In terms of recent developments, some notable advancements include the utilization of hybrid techniques to reduce FPRs, the application of evolutionary algorithms to optimize detection results, the use of DL models such as CNN and long short-term memory (LSTM) for spatiotemporal correlation analysis, and the exploration of ensemble approaches to improve detection performance.
Moreover, researchers have delved into ensemble approaches to bolster detection performance and have incorporated feature importance and data-reduction techniques to enhance model efficiency. The development of inter-dataset evaluation methods has further facilitated robust performance assessment across diverse datasets. However, there are still some limitations. These include improving DRs for minority-class attacks, investigating real-time packet clustering, and addressing the computational complexity of certain approaches. Additionally, ongoing research is aimed at exploring multi-class classification techniques to differentiate between various attack types and enhance the generalizability of models across different datasets and network architectures.
In this section, an experimental evaluation of 13 selected UL algorithms from a pool of 6 families is conducted to analyze the performance of UL-based AD methods on network datasets.
Researchers have created synthetic datasets used as benchmarks to examine new ML-based intrusion detection models and approaches. Synthetic datasets are generated in a limited, controlled environment and include benign and malicious traffic. These datasets have distinct features, usually presented in a flowchart-like structure. Some benchmark datasets include DARPA [110] (1998), KDD'99 [111] (1998), DEFCON (2000) [112], LBNL [113] (2004), Kyoto [114] (2006), NSL-KDD [115] (2009), CDX [116] (2009), ISCX 2012 [117] (2012), and CAIDA [118] (2013–17). Recently, datasets like UNSW-NB15 [119] (2015), CIDDS-001 [120] (2017), CIDDS-002 [120] (2017), CICIDS 2017 [121] (2017), CSE-CICIDS2018 [122] (2018), CICDDoS [123] 2019 (2019), BoT-IoT [124] (2019), and [125] (2020) have emerged as pivotal contributors. However, it is essential to discern their specific characteristics, particularly when comparing the most recent ones. While recent datasets like CICDoS2019, BoT-IoT, and IoT-23 focus on specific DDOS and IoT attacks, CIC-IDS2017 and CSE-CICIDS2018 cover a broader range of attacks.
To develop and evaluate intrusion detection frameworks, critical dataset characteristics include a diversity of attacks, anonymity, available protocols, complete network traffic capture, complete network interaction capture, defined network configuration, feature set, labeled data samples, heterogeneity, and metadata. CIC-IDS-2017 and CSE-CIC-IDS-2018 align with these characteristics and use profiles to generate well-ordered datasets. These datasets offer comprehensive insights into attacks, application models, network devices, and protocols. Although both datasets align with essential characteristics, CSE-CICIDS2018 is larger in size and presents challenges in processing due to the sheer volume of data instances in each file.
Amidst these considerations, the selection of the CIC-IDS-2017 dataset for evaluating the UL algorithms becomes apparent. Despite KDD99 and NSL-KDD being commonly used, their outdated content raises concerns. NSL-KDD, while addressing some KDD99 issues, may represent something other than real networks perfectly. Nevertheless, its reasonable number of records facilitates comprehensive experiments. Moreover, UNSW-NB15 is popular and widely employed due to its inclusion of diverse novel attacks. In this paper, we selected NSL-KDD, UNSW-NB15, and CIC-IDS2017 to evaluate UL algorithms based on their popularity, extensive usage, and consistent results. Table 4 provides a comparative analysis of the selected three benchmark datasets with various other datasets.
Comparative analysis of network datasets.
DARPA (1998) | MIT Lincoln Laboratory | 1998 | Emulated | Benchmark | ✓ | 7,000,000 | ✓ | DoS, Probe, U2R, R2L | 41 |
KDD CUP 99 | University of California | 1999 | Emulated | Benchmark | ✓ | 4,900,000 | ✓ | DoS, Probe, U2R, R2L | 41 |
DEFCON | N/A | 2000 | Real | Benchmark | N/A | N/A | ✓ | N/A | Flag traces |
LBNL | Lawrence Berkeley National Laboratory | 2004 | N/A | Benchmark | ✓ | >100 hr | × | Malicious traces | Internet traces |
Kyoto | Song et al. (2006) [114] | 2006 | Real | Benchmark | ✓ | N/A | ✓ | Normal and attack sessions | 24 |
CAIDA | Jonker et al. (2017) [118] | 2008–2017 | Real | Benchmark | ✓ | Huge | × | DDoS | 20 |
CDX | United States Military Academy | 2009 | Real | Real-life | N/A | 5771 | ✓ | Buffer overflow | 5 |
NSL-KDD | Tavallaee et al. (2009) [126] | 2009 | Emulated | Benchmark | ✓ | 148,517 | ✓ | DoS, Probe, U2R, R2L | 41 |
ISCX 2012 | Shiravi et al. (2012) [117] | 2012 | Real | Real-life | ✓ | 2,450,324 | ✓ | DoS, DdoS, Bruteforce, Infiltration | IP flows |
UNSW-NB15 | Moustafa and Slay (2015) [119] | 2015 | Emulated | Benchmark | ✓ | 2,540,044 | ✓ | Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, Worms | 49 |
CIDDS-001 | Ring et al. (2017) [120] | 2017 | Emulated | Benchmark | ✓ | 31,959,267 | ✓ | Ping scanning, Port scanning, Brute force, and DoS | 14 |
CIDDS-002 | Ring et al. (2017) [120] | 2017 | Emulated | Benchmark | ✓ | 16,161,183 | ✓ | Ping scanning, Port scanning, Brute force, and DoS | 14 |
CIC-IDS2017 | Sharafaldin et al. (2018) [121] | 2017 | Emulated | Benchmark | ✓ | 2,830,743 | ✓ | DdoS, Dos, Botnet, BruteForce, Infiltration, WebAttack, Port scan | 80 |
CSE-CICIDS2018 | Bharati and Tamane (2020) [122] | 2018 | Emulated | Benchmark | ✓ | 16,232,943 | ✓ | DdoS, Dos Botnet, BruteForce, Infiltration, WebAttack, Port scan | 80 |
CICDDoS 2019 | Sharafaldin et al. (2019) [123] | 2019 | Emulated | Benchmark | ✓ | 32,925 | ✓ | DdoS_DNS, DdoS_LDAP, DdoS_MSSQL, DdoS_NetBIOS, DdoS_NTP, DdoS_ SNMP, DdoS_SSDP, DdoS_SYN, DdoS_ TFTP, DdoS_UDP, DdoS_UDP-Lag, DdoS_WebDDoS | 76 |
BoT-IoT | Koroniotis et al. (2019) [124] | 2019 | Real | Real-life | ✓ | 73,360,900 | ✓ | DoS, DdoS, Reconnaissance, Theft | 29 |
IoT-23 | N/A | 2020 | Real | Real-life | ✓ | N/A | ✓ | Mirai, Torii, Hide and Seek, Muhstik, Hakai, IRCBot, Hajime, Trojan, Kenjiro, Okiru, Gagfyt | 21 |
DdoS, distributed denial of service; DoS, denial of service; IDS, intrusion detection systems; IoT, internet of things; IRCBot, internet relay chat botnet.
Note: 1: Authenticity; 2: Type; 3: Publicly available; 4: No. of instances; 5: Labeled; 6: Attack type7: Features.
The following considerations underpin the data-set-selection process:
Upgraded variant of KDD99. Designed to reduce classifier bias. Unbiased classification and consistent results. No duplicate records, ensuring balance across attack groups.
Developed in response to evolving network attack behavior. Contemporary substitute for NSL-KDD. Proximity in generation processes and functionality. Addresses shortcomings of KDD-99, enhancing applicability in modern NIDS.
Meticulously crafted with a focus on essential dataset characteristics. Well-ordered format using profiles. In-depth understanding of attacks, application models, network devices, and protocols. Alignment with critical features, making it a comprehensive resource for evaluation.
Table 4 presents a comprehensive overview of various datasets used for intrusion detection, delineating crucial aspects such as data source, year, emulation status, benchmark or real-life nature, dataset size, attack classes covered, and associated publications. NSL-KDD and UNSW-NB15 datasets, both emulated benchmarks, serve as valuable resources for assessing IDS. NSL-KDD, established in 2009, comprises over 148,000 instances and covers a spectrum of attack classes, including DoS, Probe, U2R, and R2L. Meanwhile, UNSW-NB15, introduced in 2015, boasts a larger dataset size exceeding 2.5 million instances and encompasses a broader array of attack types, rendering it suitable for evaluating modern intrusion detection techniques.
However, among these datasets, CIC-IDS 2017 emerges as a standout choice. Despite being emulated, it offers an extensive and realistic portrayal of contemporary cybersecurity threats. With over 2.8 million instances, it spans various sophisticated attacks, including DdoS, Dos, Botnet, BruteForce, Infiltration, Web Attack, and Port Scan. This diversity is crucial for evaluating the efficacy of IDS in real-world scenarios. Contrasting with older datasets like DARPA (1998) and KDD CUP 99, it is evident that the field has advanced significantly. While DARPA and KDD datasets were pioneering benchmarks, they need more scalability, diversity, and realism offered by more recent datasets like CIC-IDS 2017. In contrast with datasets like ISCX 2012 and CAIDA, CIC-IDS 2017 provides more extensive coverage of attack types and larger dataset sizes, making it better suited for evaluating the robustness and generalizability of IDS.
The newer datasets, such as CSE-CICIDS2018, maintain relevance in contemporary intrusion detection research. Notably, CSE-CICIDS2018 boasts a larger dataset size than its predecessors, presenting unique challenges in data processing owing to the sheer volume of data instances in each file. On the other hand, recent datasets like CICDoS2019, BoTIoT, and IoT-23 focus on specific types of attacks prevalent in modern cyber threats, such as DDOS and IoT-based attacks. Contrastingly, CIC-IDS2017 offers a broader spectrum of attack scenarios, covering a wide range of intrusion types beyond just DDOS and IoT attacks. CIC-IDS 2017 stands out for its comprehensive coverage, encompassing a multitude of attack scenarios ranging from network reconnaissance to web-based attacks.
Performance metrics are used to compare the effectiveness of NAD systems and to evaluate the experimental findings of various approaches. Performance metrics are obtained from the confusion matrix, which aims to categorize normal and abnormal data. The terms TP (true positive) and TN (true negative) refer to accurately predicted conditions, whereas FP (false positive) and FN (false negative) refer to incorrectly classified conditions. TPs and TNs relate to correctly categorized attacks and normal records, while FPs and FNs refer to incorrectly classified attacks and normal records, respectively [127, 128, 129]. The following metrics are selected based on those frequently applied in survey studies.
Hardware and software specifications.
Operating system | Windows 11 |
System type | 64-bit operating system, x 64-based Processor |
Processor | 11th Gen Intel ® Core (TM) |
i5-1135G7 @ 2.40 GHz | |
2.42 GHz | |
RAM | 8 GB |
Python | 3.7 |
Datasets | NSL-KDD, UNSW-NB15, CICIDS2017 |
The experiments were conducted using the hardware and software specifications given in Table 5. Three datasets are processed by transforming nominal variables to numerical, when necessary, to improve the number of usable features without altering the dataset's semantics. Parameter tuning is utilized to determine the optimal configuration for an algorithm. The tuning process is accomplished by carrying out several different parameter combinations. The results of different parameter combinations are compared to one another to select the best possible combination.
This study extensively evaluates 13 UL methods on three benchmark intrusion detection datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. The goal is to assess the performance of these algorithms and establish their effectiveness compared to existing research. These UL algorithms have been implemented for AD, as referenced in [130, 131, 132, 133, 134], and tested on both traditional and modern datasets to ensure a thorough analysis. Additionally, unsupervised dimensionality-reduction techniques from the six families of AD techniques are employed to tackle the challenge of high-dimensional real-world data. This allows the representation of data in reduced dimensions.
A decision tree classifier is utilized for result prediction to gauge the performance of dimensionality reduction approaches. For detailed insights into the results, Tables 6–8 present the performance of the 13 selected UL methods for NAD on the three benchmark datasets. These findings provide valuable evidence for the efficacy of these algorithms in AD.
The performance results presented in Tables 6–8 highlight the notable performance of the selected UL methods, particularly on the NSL-KDD dataset. The lack of redundant connections and even data distribution across attack categories and normal connections in the NSL-KDD dataset contribute significantly to these results. As shown in Figures 6–8, the study indicates that ODIN outperforms KNN in detecting anomalies by utilizing an “indegree score” based on the KNN graph. DBSCAN was more effective than K-means and EM in clustering-based techniques. The study observed that nearest-neighbor-based approaches usually outperformed clustering methods, but clustering-based approaches were more computationally efficient for processing large datasets in near real-time environments.
Results of 13 selected UL methods for NAD on the NSL-KDD dataset.
K-NN | Neighbor-based | 98.4 | 98.0 | 97.9 | 98.5 |
ODIN | Neighbor-based | 99.4 | 99.0 | 98.6 | 98.8 |
LOF | Density-based | 97.0 | 96.8 | 95.3 | 95.6 |
COF | Density-based | 94.0 | 94.5 | 96.1 | 95.1 |
K-means | Clustering-based | 94 | 92.46 | 92.3 | 93 |
DBSCAN | Clustering-based | 94.5 | 94.0 | 92.9 | 94.0 |
EM | Clustering-based | 95.0 | 93 | 92.9 | 94.0 |
PCA | Dimensionality-reduction | 97.8 | 97.0 | 96.7 | 97.0 |
KPCA | Dimensionality-reduction | 99.2 | 99.0 | 98.9 | 98.6 |
ICA | Dimensionality-reduction | 98.0 | 97.5 | 97.8 | 98.1 |
HBOS | Statistical-based | 94.4 | 93.9 | 95.8 | 96.0 |
One-class SVM | Classification-based | 99.4 | 99.2 | 99.2 | 99.5 |
IF | Classification-based | 99.9 | 99.8 | 99.6 | 99.7 |
COF, connectivity-based outlier factor; DBSCAN, density-based spatial clustering of applications with noise; EM, expectation maximization; HBOS, histogram-based outlier score; ICA, independent component analysis; IF, isolation forest; KNN, K-nearest neighbor; KPCA, kernel principal component analysis; LOF, local outlier factor; NAD, network anomaly detection; ODIN, outlier detection using indegree number; PCA, principal component analysis; SVM, support vector machine; UL, unsupervised learning.
Results of 13 selected UL methods for NAD on the UNSW-NB15 dataset.
K-NN | Distance-based | 98.2 | 97.8 | 97.6 | 97.7 |
ODIN | Distance-based | 99.2 | 98.9 | 98.5 | 98.6 |
LOF | Density-based | 96.9 | 95.8 | 95.1 | 95.0 |
COF | Density-based | 93.8 | 94.2 | 95.9 | 94.7 |
K-means | Clustering-based | 93.7 | 92.0 | 92.1 | 92.9 |
DBSCAN | Clustering-based | 94.2 | 93.8 | 92.7 | 93.5 |
EM | Clustering-based | 95.5 | 92.6 | 92.3 | 93.4 |
PCA | Dimensionality-reduction | 97.2 | 96.9 | 97.3 | 97.1 |
KPCA | Dimensionality-reduction | 99.1 | 98.5 | 98.5 | 98.7 |
ICA | Dimensionality-reduction | 97.7 | 97.4 | 97.1 | 97.3 |
HBOS | Statistical-based | 92.3 | 91.9 | 94.6 | 94.8 |
One-class SVM | Classification-based | 99.2 | 99.0 | 99.2 | 99.4 |
IF | Classification-based | 99.7 | 99.5 | 99.3 | 99.6 |
COF, connectivity-based outlier factor; DBSCAN, density-based spatial clustering of applications with noise; EM, expectation maximization; HBOS, histogram-based outlier score; ICA, independent component analysis; IF, isolation forest; KNN, K-nearest neighbor; KPCA, kernel principal component analysis; LOF, local outlier factor; NAD, network anomaly detection; ODIN, outlier detection using indegree number; PCA, principal component analysis; SVM, support vector machine; UL, unsupervised learning.
Results of 13 selected UL methods for NAD on the CIC-IDS2017 dataset.
K-NN | Distance-based | 96.4 | 96.7 | 96.0 | 97.0 |
ODIN | Distance-based | 97.6 | 97.4 | 97.5 | 98.0 |
LOF | Density-based | 96 | 95.3 | 93.6 | 94.8 |
COF | Density-based | 93.3 | 93.2 | 95 | 93.7 |
K-means | Clustering-based | 90.2 | 87.4 | 88.6 | 89.5 |
DBSCAN | Clustering-based | 93.7 | 92.7 | 90.6 | 92.3 |
EM | Clustering-based | 94.3 | 93.2 | 91.2 | 92.4 |
PCA | Dimensionality-reduction | 96.3 | 95.4 | 94.9 | 95.1 |
KPCA | Dimensionality-reduction | 97.3 | 97.2 | 96.9 | 97 |
ICA | Dimensionality-reduction | 96.3 | 96.2 | 96 | 96.3 |
HBOS | Statistical-based | 95.9 | 94.9 | 93.5 | 93.0 |
One-class SVM | Classification-based | 98.3 | 98.1 | 99 | 98.4 |
IF | Classification-based | 99.6 | 99.5 | 99.2 | 99.4 |
COF, connectivity-based outlier factor; DBSCAN, density-based spatial clustering of applications with noise; EM, expectation maximization; HBOS, histogram-based outlier score; ICA, independent component analysis; IDS, intrusion detection systems; IF, isolation forest; KNN, K-nearest neighbor; KPCA, kernel principal component analysis; LOF, local outlier factor; NAD, network anomaly detection; ODIN, outlier detection using indegree number; PCA, principal component analysis; SVM, support vector machine; UL, unsupervised learning.
In density-based approaches, LOF surpasses COF, especially in scenarios requiring the detection of local anomalies. However, COF was recommended when additional information about the nature of anomalies in the evaluated dataset was unavailable. Dimensionality-reduction techniques, particularly KPCA and ICA, outperformed PCA across all data-sets, with KPCA yielding the best results. This superiority can be attributed to the ability of KPCA and ICA to capture higher-order information in the original inputs, unlike PCA. HBOS, a statistical method assuming feature independence, demonstrates prominent results and is well suited for application in large datasets due to its fast computation time. It serves as a valuable option for AD in such scenarios.
The study also observed that algorithms belonging to the classification family consistently outperformed algorithms from other families on all datasets. IF stood out as the top-performing algorithm, achieving impressive average scores of 99.9, 99.7, and 99.6 on the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets. The findings of this study have practical implications for real-world NAD systems. IF is recommended for critical infrastructure networks, while ODIN is suitable for healthcare systems or financial institutions where minimizing false negatives is critical. Clustering-based methods such as DBSCAN and density-based techniques like LOF can be employed in network monitoring systems to identify unusual patterns or behaviors that may indicate security breaches or system malfunctions. Finally, dimensionality-reduction techniques like KPCA and ICA can reduce computational overhead and facilitate real-time anomaly detection in dynamic network environments.
By understanding the practical applications of these UL methods, researchers can tailor their selection and implementation strategies to address specific security challenges and operational requirements. The study's findings reveal the superior performance of ODIN, DBSCAN, LOF, KPCA, ICA, and IF. They can be employed to develop more effective NAD systems for various industries, preventing potential cyber threats, enhancing network security, and improving operational efficiency.
This section discusses various unexplored research areas and significant prospects for using UL in network intrusion/anomaly detection.
This paper provides an in-depth analysis of UL anomaly detection (AD) techniques, highlighting recent developments while identifying key research limitations. The study demonstrates the effectiveness of these approaches in real-world network security scenarios by evaluating 13 UL methods across benchmark intrusion detection datasets. IFs, ODIN, DBSCAN, LOF, KPCA and ICA are some of the methods evaluated, which have versatile applications in anomaly detection in various sectors such as critical infrastructure networks, healthcare, finance, and network monitoring.
The paper suggests several recommendations to guide future research endeavors for advancements of UL methods. One of the recommendations is to address the interpretability challenges inherent in DL algorithms, particularly in the context of NAD, by integrating explainable AI techniques such as attention mechanisms or back-propagation methods into UL models. This integration would enhance transparency and facilitate informed decision-making in security operations. Another recommendation is to explore ensemble techniques for inter-dataset evaluation, which hold promise for assessing UL methods' generalization capabilities across diverse datasets. In addition, the paper suggests the fusion of self-taught learning with UL as a compelling avenue for enhancing anomaly detection capabilities in network security. Leveraging STL's adaptive learning mechanisms can augment UL models' ability to discern complex patterns and anomalies, bolstering real-time threat detection and response capabilities.
The paper suggests strategies like data augmentation, transfer learning, and feature engineering to address practical constraints hindering UL's functional success, such as data scarcity and semantic gaps between model outputs and operational interpretations. Furthermore, the integration of model explainability techniques such as feature importance analysis, SHAP, LIME, and Integrated Gradients, alongside human-in-the-loop approaches involving expert feedback and domain knowledge integration, can enhance interpretability and align UL models more closely with operational requirements, thereby fostering the development of more robust and actionable anomaly detection solutions. By heeding these recommendations, researchers can navigate today's challenges more effectively and pave the way for advancements in UL-based NAD.