A Holistic review and performance evaluation of unsupervised learning methods for network anomaly detection

With the rapid growth of the Internet, massive data are generated daily. With evolving hacking tools and methods, robust security measures are crucial. Businesses have suffered significant financial losses due to security attacks and network intrusions [1, 2]. To address these challenges, network anomaly detection (NAD) and intrusion detection systems (IDSs) safeguard network systems from unauthorized access and infiltration activities. An IDS is a device that monitors and detects attacks or intrusions in a computer or network [3]. It provides alerts to administrators for corrective actions. Traditional detection systems were manual and could not keep up with the increasing data volume and speed. Misuse detection techniques were introduced but were limited to known attacks. Anomaly detection (AD) techniques have replaced traditional signature-based schemes, working with unlabeled data to detect unknown attacks. This eliminates the need for costly labeled traffic generation and allows for identifying new attack versions [4].

AD techniques capture deviations from normal behavior, detecting unknown attacks without prior knowledge [5]. However, they can have a high false positive rate (FPR) due to overfitting. False positives in anomaly detection occur when normal situations are mistakenly classified as abnormal due to unrecognized patterns. Data mining techniques, particularly unsupervised learning (UL), overcome these limitations by using unlabeled data and detecting previously unseen attacks [3]. UL algorithms used in NAD improve the quality of data representations by removing newly derived variables before the categorization process. This means these algorithms eliminate unnecessary or redundant variables, resulting in more informative and accurate representations of the original data. While unsupervised NAD techniques have received less attention, their combination with supervised learning has shown promising results in research. Unsupervised and hybrid techniques have recently gained popularity for effective AD systems.

This paper emphasizes UL's significance in enhancing the security of NAD systems. Notably, UL offers several benefits, including:

UL is analogous to how a human learns to contemplate through their knowledge, bringing it closer to evident AI.

For obtaining meaningful information from data, UL is effective.

Data that are not classified or categorized are used in UL, which makes it more significant to deploy in real environments.

Input data are not always available in a structured format that conforms to output data in the real world. So, UL is needed to derive patterns from unstructured data.

AD schemes have gained increased attention due to the rise of zero-day attacks [5]. Researchers have extensively explored supervised and unsupervised approaches to effectively identify intrusions and anomalies since the early days of networking and the Internet [6]. Classification and clustering methods, such as support vector machine (SVM), K-nearest neighbor (KNN), and k-means, are commonly employed for modeling normal network behavior. Clustering and outlier detection play a significant role in UL schemes for NAD [5, 6, 7, 8]. Promising techniques for high-dimensional datasets include density and grid-based clustering [9, 10]. Hybrid approaches like k-means with decision trees [11] and genetic algorithms (GAs) [12] are effective in detecting network attacks. Self-organizing maps (SOMs) aid in data visualization and dimensionality reduction [13]. Recent developments in artificial neural networks (ANNs), particularly deep learning (DL) architectures, have sparked a revolution [14, 15, 16].

Therefore, this paper offers an in-depth analysis of NAD using unsupervised approaches, highlighting recent developments while identifying key research limitations. The evaluation of 13 UL methods across benchmark intrusion detection datasets aims to assess their effectiveness and provide valuable insights for future research and development in NAD systems. Figure 1 depicts the results of a bibliometric analysis conducted on AD literature. Subfigure (a) illustrates the publication trend of bibliometric papers from 2010 to 2023. Subfigure (b) highlights the data-retrieval process, focusing on publications from reputable publishers such as Springer, IEEE, and Elsevier, which collectively provided most of the analyzed data.

Bibliometric analysis of AD literature (A) Chronological distribution of bibliometric papers (B) Percentage distribution of publication types. AD, anomaly detection.

Survey timeline

This section reviews and evaluates the performance of UL methods for detecting network anomalies and intrusions. It highlights significant contributions from selected survey papers published between 2000 and 2023. The surveys are categorized into three time intervals for evaluation.

a.i

2000–2007

Various studies have explored unsupervised AD algorithms for IDS. In 2000, one study compared cluster-based estimation, KNN, and one-class SVM using the KDD Cup 99 dataset [17]. In another analysis in 2003, the authors evaluated AD approaches on the DARPA98 dataset, showing promising results with local outlier factor (LOF) and unsupervised SVM [18]. In 2007, the authors reviewed different clustering schemes and analyzed that K-harmonic means performed well in speed, while population-based stochastic techniques were suitable for cluster quality [19]. Additionally, in a study in 2007, seven distinct IDS techniques for HTTP anomalies were tested and compared under simulated and real-world settings. It was found that the token-based approaches exhibited better discrimination and generalization using heuristics-based approaches [20].

a.ii

2008–2015

In 2009, the authors presented literature studies to review machine learning (ML) methods for IDS, including single, hybrid, and ensemble classifiers [21]. In 2012, the authors reviewed and compared different clustering techniques based on parameters such as data type, cluster type, complexity, dataset, and measures, along with their advantages and disadvantages [22]. In another study in 2014, the authors present an overview and comparison of four novelty detection techniques. The experimental outcomes indicate that the KNN novelty detection scheme performs better than other methods [23]. Moreover, in 2015, the authors provided a comprehensive review of the 19 most popular clustering algorithms with a detailed analysis of traditional and modern approaches to emphasize the importance of the essential data-analysis technique, clustering [24].

a.iii

2016–2023

In 2016, a comprehensive assessment of 19 unsupervised AD techniques was conducted on 10 datasets, showcasing their practical significance across diverse applications [1]. Another study in 2016 compared four AD algorithms, highlighting the superior performance of one-class SVM in smart city wireless sensor networks [2]. In 2018, a survey focused on comparing feature techniques and bridging the gap between IDS and forensic investigation [3]. A 2019 study evaluated unsupervised AD algorithms against unified attack models and addressed research questions related to performance. A detailed review in 2019 explored UL methods in various networking domains, discussing recent advances, challenges, and future directions [4]. A general introduction to the ML algorithms applied to IDS can be found in (2020). The proposed work also performs experiments on the KDD-CUP dataset to evaluate the performance of ML algorithms [5]. In 2021, the authors presented a comprehensive survey on IDS with special attention to feature-selection methods, different types of datasets, challenges, and possible future directions [6].

The survey conducted in 2021 categorizes anomaly detection algorithms into seven distinct categories based on their working mechanisms [7]. These categories include statistical models, density-based methods, distance-based techniques, clustering approaches, isolation methods, ensemble methods, and subspace techniques. For each category, the survey provides insights into the time complexity of the algorithms and outlines their general advantages and disadvantages. Furthermore, a detailed and systematic survey of clustering methods is provided in (2022). This paper covers an in-depth overview of state-of-the-art clustering methods, performance, open issues, and various application areas of clustering methods [8]. A comprehensive overview of UL techniques for AD was conducted in 2022, emphasizing their utility in scenarios with scarce labeled data [9]. It highlights the challenges associated with traditional outlier detection methods and underscores the potential of unsupervised approaches in handling raw data effectively. Another survey was conducted in 2022 to delve into unsupervised anomaly localization (AL) using DL for industrial image inspection [10]. With its in-depth technical insights, the survey emerges as a vital resource for researchers venturing into industrial AL and seeking to extend its applications across diverse domains.

In a 2023 survey, outlier detection strategies from seven families were analyzed, revealing that consensus-based strategies derived from DL outperformed standalone UL methods [11]. Another comprehensive study in 2023 explored anomaly detection techniques within sensor networks and the internet of things (IoT) [12]. It unveiled a significant focus on detecting malicious activities, spanning various ML and DL approaches, with a notable surge in interest between 2017 and 2022. The study showcased a significant adoption of DL and ML techniques, notably MLPs and graph-based methods, across various anomaly detection applications, reflecting the evolving landscape of IoT security measures and the need to address emerging threats comprehensively. A review conducted in 2023 focused on different clustering categories within semisupervised and UL approaches [13]. Furthermore, another survey conducted in 2023 delved into automatic log file analysis for early incident detection, particularly focusing on self-learning anomaly detection techniques [14]. These methods, predominantly leveraging DL neural networks, demonstrated superior performance over traditional approaches and addressed issues with unstable data formats. A comparison of existing anomaly detection surveys with our study is presented in Table 1.

Table 1:

A relative comparison of this paper with existing surveys/review papers.

References	Discussion	Focus	1	2	3	4	5	6
[15]	Focus on literature review and performance evaluation of UL methods.	IDS	≈	×	×	×	✓	×
[16]	It provides a comparative analysis of various unsupervised NAD approaches and investigates standard metrics suitable for many connections.	AD in networks	≈	✓	×	×	✓	×
[17]	Presents a detailed review of different clustering methods and their strengths and weaknesses.	_	≈	✓	×	×	×	×
[18]	Presents a framework for evaluating anomaly detection methods for HTTP.	HTTP AD	≈	≈	×	×	✓	×
[19]	An overview of ML techniques for intrusion detection between 2000 and 2007 is presented.	IDS	✓	×	×	≈	×	×
[20]	It provides an overview and comparison of various clustering techniques, pros, and cons.	AD	≈	≈	×	×	×	×
[21]	Performance evaluation of four novelty detection algorithms is carried out.	Novelty detection	✓	✓	×	×	✓	✓
[22]	Presents a detailed review of popular traditional and modern clustering approaches.	_	≈	≈	×	×	×	×
[1]	Nineteen widely employed unsupervised techniques are evaluated to provide an understanding of their performance in various domains.	AD	≈	✓	×	×	✓	×
[2]	The proposed work focuses on a comparative evaluation of four UL algorithms.	Smart city wireless networks	≈	✓	×	×	✓	×
[3]	A survey on ML techniques focusing on unsupervised and hybrid IDS is presented.	IDS	✓	✓	×	✓	×	✓
[4]	The proposed work aims to provide an experimental comparison of unsupervised methods employed for NAD against five intrusion detection datasets.	Anomaly-based IDS	≈	✓	×	✓	✓	×
[23]	A comprehensive survey of UL techniques in networking is presented.	Anomaly-based in networking	≈	✓	✓	×	×	✓
[5]	Review and compare ML algorithms for IDS.	IDS	✓	≈	×	×	✓	×
[6]	Presents a detailed review of state-of-the-art IDS methods, datasets, performance measures, and research challenges.	IDS	✓	≈	×	✓	✓	✓
[8]	An in-depth review of state-of-the-art clustering methods is presented.	-	≈	✓	✓	×	×	✓
[9]	A comprehensive overview of UL techniques for AD is presented, emphasizing their utility in scenarios with scarce labeled data.	AD in industrial applications	✓	✓	✓	×	×	✓
[11]	Review and compare outlier detection techniques from seven different families.	Outlier detection	≈	≈	×	×	✓	×
[12]	AD techniques are explored comprehensively to address the detection of emerging threats.	IoT and sensor data	✓	≈	✓	×	×	✓
[13]	In-depth review of various unsupervised and semisupervised clustering methods.	-	✓	✓	≈	×	✓	×
[14]	Focus on log file analysis for early incident detection, particularly emphasizing self-learning AD techniques.	AD	✓	≈	✓	×	×	×
Our survey	The primary focus is on studying UL NAD methods while considering recent advances.	NAD	✓	✓	✓	✓	✓	✓

DL, deep learning; IDS, intrusion detection systems; IoT, internet of things; ML, machine learning; NAD, network anomaly detection; UL, unsupervised learning.

Note: 1: Review of ML and DL-based NAD approaches; 2: Unsupervised methods; 3: Recent advances; 4: Comparative analysis of network datasets; 5: Experimental analysis; 6: Open issues/future directions; Notations: ✓: Included; ×: Not included; ≈: Partially included.

From Table 1, existing literature offers surveys on ML techniques for NAD and other domains but lacks a comprehensive assessment of UL techniques specifically for NAD systems. This paper fills that gap by comparing UL techniques based on relevant parameters, as mentioned in Table 1. It also includes recent advances in unsupervised NAD and empirical evaluations on benchmark datasets to analyze performance across conventional and modern datasets.

Motivation

The motivation for this paper stems from the need to address the shortcomings observed in existing research on UL approaches in NAD systems. The paper aims to fill various gaps identified in previous studies by conducting a comprehensive survey. These gaps include the need for more thorough comparative analyses that consider the characteristics of different algorithm classes, the limited generalizability of findings due to analyses on limited datasets, and insufficient emphasis on recent network datasets in evaluations. Moreover, previous reviews require more coverage of recent advances in NAD systems and performance evaluations on benchmark datasets such as NSL-KDD, UNSW-NB15, and CIC-IDS2017. To bridge these gaps, this paper evaluates 13 UL algorithms on popular benchmark datasets, providing a benchmark for anomaly-based IDS. Furthermore, the analysis conducted in this paper highlights the strengths and weaknesses of UL algorithms across varied dataset environments, offering valuable insights for further research and development in the field of NAD systems.

Contribution

The key contributions of this paper are highlighted as under:

It discusses the state-of-the-art unsupervised AD methods and reviews their utilization for detecting network attacks.

It presents contemporary coverage of recent trends and innovations in the respective field.

This paper investigates the experimental comparison of 13 selected unsupervised AD methods on three popular benchmark datasets. The analysis of results is also presented.

The findings of the experiments provide a basis for evaluating the performance of algorithms across different families of AD techniques about network datasets and for forecasting the efficacy of the chosen algorithm under varying parameter settings. It can be utilized to assist in the selection of a suitable UL algorithm for a given task.

Finally, the paper presents future works highlighting potential pitfalls and research gaps in the literature.

Organization

The present paper is presented in different sections. Section 2 introduces 13 selected UL-based methods from the pool of 6 classes of AD. This section also presents an unsupervised NAD detection model. Section 3 is the core section that discusses literary works in detail by proposing the classification of UL approaches into distinct classes. It also presents recent works to highlight the latest developments and innovations. Section 4 performs the experimental evaluation of 13 selected UL algorithms on 3 benchmark datasets. In Section 5, research challenges and future directions are detailed. Finally, the paper is concluded in Section 6. The roadmap of the paper is depicted in Figure 2.

II.

Ul-Based Techniques

The following section discusses the various UL methods that are used for AD. A generic framework of the NAD model employing K-means clustering is also presented. The focus of this section is to provide an in-depth discussion of six different families of AD and their types.

Selection of the algorithms

The literature review evaluates several UL algorithms based on their performance in NAD systems. A taxonomy of 13 unsupervised algorithms is proposed from the pool of 6 families of AD techniques, as presented in Figure 3, based on the following criteria:

Based on previous practical applications, primarily employed in intrusion detection/NAD.

The collection of algorithms should include the following classes of AD techniques: clustering, dimensionality reduction, statistical, neighbor-based, density-based, and classification-based to develop an effective prediction model.

Unsupervised dimensionality reduction algorithms are also included in 13 selected methods to facilitate non-redundant and relevant features in the dataset and support the effective deployment of UL methods in real-world data with higher dimensions. The categorization of 13 UL methods selected from 6 families of AD is presented in Figure 3.

Taxonomy of 13 selected UL methods for anomaly detection from 6 families. UL, unsupervised learning.

The proposed taxonomy of UL methods from the six families of AD techniques is illustrated in Figure 3. This taxonomy aids in developing AD models for detecting network anomalies and intrusions. The taxonomy of AD techniques is discussed below and empirically evaluated in Section 5, providing insights into their performance across diverse network datasets.

a.i

Neighbor based

KNN [24]: It is a neighbor-based technique. This algorithm focuses on detecting outliers. The KNNs of a given data point are determined by comparing it to the entire dataset and selecting the items with the most comparable feature values. The data point is considered abnormal if the most nearest neighbor (NN) was previously categorized as abnormal.

Outlier detection using indegree number (ODIN) [25]: It derives from the KNN algorithm, which analyzes the entire dataset to find the feature distances to the specified point [26]. This algorithm permits isolating NN, so the KNN graph is produced. Unlike KNN, ODIN classifies the data points as anomalies with a few nearby edges in the KNN graph.

a.ii

Density based

LOF [27]: It utilizes the concept of local density to identify anomalous points/outliers in the datasets. The NN is calculated for each data point. Next, the local reachability density (LRD) is calculated using the computed neighborhood. The LOF score is determined by comparing the LRD of a data point to the LRD of a previously computed NN to find outliers [28]. A data point is more likely to be an outlier if its LOF value is high.

Connectivity-based outlier factor (COF) [29]: The LOF is upgraded. It works by ascribing a degree of outlierness to each data point. This degree of anomaly is called the COF of the data point. A data point with a high COF value indicates a high likelihood of being an outlier.

a.iii

Clustering based

K-means clustering [30]: A popular clustering technique that divides data points into k clusters by feature values. The k-cluster centroids are initialized randomly [26]. Then, each data record is assigned to the cluster with the nearest centroid, and the modified cluster centroid is recalculated. This procedure stops when the centroids stop changing anymore. Data points with comparable feature values are clustered using a distance algorithm. The final step involves calculating the scores of each data point within a cluster as the distance to its centroid [31]. Data points distant from their clusters' centers are flagged as anomalies.

Expectation maximization (EM) [32]: It is based on the Gaussian mixture model (GMM). The GMM approach improves the density of sample data represented as a function of a single-density estimation method with multiple Gaussian probability density functions to describe the data distribution. EM and K-means are comparable in facilitating model refinement through an iterative process to discover the optimal configuration. However, the K-means algorithm uses a different method for determining the Euclidean distance when computing the distance between each pair of data points, whereas EM employs statistical methods [32].

Density-based spatial clustering of applications with noise (DBSCAN) [33]: It is a density-based clustering technique that uses the hypothesis that clusters are discrete high-density regions separated by low-density regions [34]. Data points “densely grouped” are aggregated into a single cluster. It can discover clusters in huge spatial datasets by analyzing the local density of the data points [35]. Each object that does not belong to any cluster is considered noise. The most impressive aspect of DBSCAN clustering is its robustness against outliers.

a.iv

Dimensionality reduction

Principal component analysis (PCA) [36]: PCA is a dimension reduction strategy that speeds up the learning process with fewer computations. It is a popular unsupervised method for defining the mapping from high- to low-dimensional space. It uses an orthogonal transformation to transform observations of correlated features into a set of linearly uncorrelated features. PCA considers the variance of each feature because a high value indicates a clear separation between classes and, therefore, lowers the dimensionality. It generates a set of principal components ranked by variance (the first component has a greater variance than the second, and so on), uncorrelated, and few.

Kernel principal component analysis (KPCA) [37]: An extension of the PCA technique allows non-dimensionality reduction. If the data structure is nonlinear and cannot fit well in linear subspace, KPCA enables us to generalize PCA in nonlinear datasets. It employs one of the kernel methods to translate the nonlinear features onto a higher-dimensional feature space.

Independent component analysis (ICA) [38]: It is a computational method mainly used in signal processing and differentiating. It finds hidden components inside datasets of random variables, signals, or measurements. It aids in splitting multidimensional signals into discrete parts considered non-Gaussian and independent of one another. In contrast to PCA, which maximizes the variance of the data points, ICA emphasizes independence, i.e., separate components [39].

a.v

Statistical based

Histogram-based outlier score (HBOS) [40]: A statistical method creates a histogram for each independent feature in the dataset. Histograms are first constructed using the feature value distributions of all the available data points [41]. Subsequently, the anomaly score for each data point is determined by multiplying the inverse heights of the columns into which its features fall.

a.vi

Classification based

One-class SVM [42]: One-class SVM is a UL method for learning how to distinguish test samples of a specific class from those of other categories. It works on the fundamental principle of reducing the hypersphere of a single course of instances in training data, assuming that any sample outside the hypersphere is an outlier or outside the training data distribution.

Isolation forest (IF) [43]: IF works on the premise that anomalies are “few and distinct” data points. The data in an IF are sampled randomly and then processed using a tree structure with features picked randomly. Due to the increased number of cuts required to separate samples from deeper within the tree, there is a lower probability that they include anomalies. Moreover, the examples that fall into shorter branches are anomalous because the tree was able to differentiate them from the other observations more efficiently [44].

Unsupervised NAD model

This section presents the NAD model based on UL, which mainly employs K-means clustering for grouping data instances.

Figure 4 displays the framework of the NAD model that utilizes UL. The framework provides a detailed methodology for the NAD model and consists of several phases, each serving a specific purpose to enhance the effectiveness of NAD systems. These phases include preprocessing, feature selection, grouping based on K-means clustering, and detection using a Random Forest classifier. The preprocessing phase ensures data integrity by converting categorical variables into numerical format for improved processing. The feature selection phase employs correlation-based feature selection (CFS) to optimize dataset dimensions while preserving the intrinsic meaning of features. The grouping phase involves K-means clustering, which uses similarity measures to create a comprehensive normal behavioral profile. UL based on clustering enhances exploratory data analysis, exposing underlying structures. In the detection phase, the framework employs a Random Forest classifier, a powerful decision-making tool, to accurately identify data instances. The framework's contribution extends beyond precise categorization; it reveals meaningful patterns within cohesive subgroups, adding depth to understanding network intrusions. A brief description of the unsupervised model is discussed below.

Framework of NAD based on UL [45]. NAD, network anomaly detection; UL, unsupervised learning.

The various stages are:

Preprocessing: The preprocessing phase is fundamental for ensuring the quality of the data-set. Cleaning data and transforming categorical variables into numerical form are crucial steps. Incorporating min–max normalization scales data variables within a small range enhances computational efficiency.

Feature selection: Feature selection is introduced to reduce the dimensionality of the dataset cost-effectively. The CFS method is chosen for its quick processing capabilities and low computational cost, retaining the physical meaning of features.

Grouping: The grouping phase is designed to build a normal behavioral profile using K-means clustering. This phase is pivotal for forming distinct clusters based on similarity measures. The iterative nature of K-means clustering involves centroid initialization, distance estimation, and cluster optimization to ensure convergence and meaningful cluster formation. Centroid initialization, distance estimation, and cluster optimization are fundamental to K-means clustering, vital for convergence and meaningful pattern extraction. The process begins by initializing “K” centroids, influencing convergence, and final cluster assignments. Distance estimation calculates distances between data points and centroids, guiding data point assignments based on similarity. Cluster optimization recomputes centroids iteratively, ensuring convergence with minimal changes or reaching a predefined limit. Convergence assurance is crucial for stable and reliable results. Simultaneously, meaningful patterns emerging as centroids dynamically adjust based on data point assignments, capturing significant dataset structures.

Detection: In the final detection phase, the framework adopts a Random Forest classifier, a choice rooted in its reported excellent accuracy in the literature. The application of Random Forest enables the categorization of data instances into normal or attack categories. Key metrics such as accuracy, recall, and precision are employed to evaluate the model's performance.

III.

Literature Survey

In this section, various literature works employed for network intrusion detection using UL are discussed in detail. The preliminary studies categorize various UL approaches into six classes, providing a comprehensive framework for organizing and analyzing this field's diverse methods.

UL techniques

This study aims to classify the existing literature on UL approaches into different categories to improve understanding and facilitate effective comparisons. The identified classes include clustering, SOM, dimensionality reduction, classification-based techniques, hybrid methods, and DL, as depicted in Figure 5. By organizing the diverse range of approaches into these distinct classes, this classification framework facilitates meaningful comparisons and analysis, aiding researchers and practitioners in selecting the most suitable approaches for their specific tasks and advancing the field of UL.

a.i

Clustering-based

An unsupervised approach based on density and grid-based clustering was proposed in 2005 [46]. It effectively classified a high-dimensional dataset into a collection of clusters in which the points not associated with any cluster are flagged as abnormal. The presented strategy achieved comparable outcomes; still, the FPR was high [46]. Another advanced method based on fuzzy rough C-means clustering was introduced in further research [47, 48]. K-means clustering is a well-known methodology for finding anomalies, outperforming prior unsupervised algorithms in accuracy [49]. Later 2012, a new variant was introduced that merged K-means clustering with the C4.5 decision tree approaches to generate more effective outcomes than previous methods [50]. Density-based clustering methods excel at identifying clusters of arbitrary shapes, while grid-based clustering schemes offer fast processing.

Leung and Leckie [46] proposed the merging of adaptive intervals approach to spatial data (MAFIA). This novel mechanism combines density-based and grid-based clustering to discover high-dimensional input space clusters. Gujral et al. [51] employed spectral clustering to identify scattered anomalies and form representative clusters. A sparse symmetric matrix was utilized with a KNN to address scalability in IDS. In another work [52], a two-fold framework for NAD in high-dimensional data was proposed by Ni et al. [52] The framework uses a clustering technique to approximate cluster centers and the mutual information coefficient to remove redundant and irrelevant features. Comparing experimental results to those of other unsupervised approaches, accuracy was enhanced. To represent normal data into clusters, Bhuyan et al. [53] presented an effective outlier-based technique employing a variation of k-means clustering. Based on outlier scores that exceed a threshold, anomalies are found.

Syarif et al. [54] compared misuse detection and AD algorithms using four classifiers (Naïve Bayes, rule induction, NN, and decision tree) for misuse detection and five clustering algorithms (k-means, k-medoids, EM clustering, and distance-based outlier detection) for AD. Evaluation of DARPA98 and NSL-KDD datasets showed that detecting unknown attacks remains challenging for misuse detection. Rule induction achieved 99.58% accuracy with a 0.40% FPR for known attacks, while the decision tree classifier performed poorly with 63.97% accuracy and a 17.90% FPR. In contrast, the distance-based outlier detection algorithm achieved 80.15% accuracy in detecting unknown intrusions in the AD module. Bhuyan et al. [55] proposed Tree-based Subspace Clustering (TreeCLUS), a UL approach for detecting network anomalies with a minimized FPR. It utilizes cluster stability analysis to identify effective clusters and a multi-objective scheme for cluster labeling.

In another work, Prasad et al. [56] proposed an unsupervised feature selection technique for IDS. This technique selects the most informative features by calculating feature scores using entropy and computing the feature histogram. The study also introduced a novel cluster center initialization approach to address the limitations of conventional clustering techniques. The approach computed semi-identical instances, created sets, and avoided outliers as initial cluster centers. Micro-clusters were formed based on spatial distance, and similar micro-clusters were merged into macro-clusters. The presented cluster centralization approach outperformed basic clustering methods, requiring fewer iterations to obtain final clusters and improving the accuracy of the detection model.

UNADA [57] is a distributed computing framework-based scheme designed to enhance the detection of NAD unsupervised algorithms. It leverages a density-based grid algorithm deployed on a large cluster of servers to process high traffic volumes efficiently. However, DBSCAN's limitations in handling high-dimensional data are addressed by introducing subspace clustering and evidence accumulation techniques for parallel processing across multiple servers. Despite its advantages, UNADA's time complexity hampers its suitability for real-time and scalable anomaly detection. In another work [58], a real-time UL anomaly detection scheme for streaming data is proposed. Hierarchical temporary memory (HTM) detects spatial and temporal anomalies in noisy environments. The strategy adapts well to changing data streams while minimizing false positives. However, there is potential for further accuracy improvement by adopting ensemble-based approaches. Additionally, the exploration of real-time anomaly detection techniques for multivariate data streams is crucial to address the challenges of the current scenario.

a.ii

SOM

SOMs are frequently applied as unsupervised tools for identifying malicious and abnormal network behavior. The characteristic of SOMs is that they can automatically organize and infer patterns from a range of inputs and then assess whether a piece of new information conforms to the deduced pattern or not, hence identifying anomalous inputs [6, 59, 60]. Landress [61] proposed a fusion model for UL in intrusion detection to address false positives. The model combines K-means clustering, J48 decision tree, and SOM using the KDDCUP99 dataset. Clustering is performed with k-means, grouping attacks into normal, Neptune, smurf, and ipsweep categories. J48 decision tree is then used for feature selection, reducing false positives by selecting optimal features. SOM is applied to visualize the data and categorize abnormal behavior.

Huang and Huang [62] employed GHSOM to analyze anomalous traffic behavior by analyzing node density and samples. The network traffic data are analyzed via visual representation of attack patterns concerning hierarchical relationships. In the detection phase, GHSOM is combined with SVM, which provides classification rules to identify the network traffic into normal and abnormal behaviors.

a.iii

Dimensionality reduction

PCA is commonly used for dimensionality reduction in IDS [63, 64], but ICA is proposed as an alternative for non-Gaussian data [65]. Pattewar and Sonawane [66] present a feature extraction approach based on PCA and KPCA. In the first approach, PCA is utilized for extracting features. KPCA and Bayesian methods are employed in the second approach to eliminate the noisy features, thereby removing the critical features that result in better accuracy and computational efficiency than that of PCA.

In another work proposed by Kuang et al. [67], hybridization of the KPCA, GA, and SVM model is employed. In the proposed scheme, KPCA is applied to elicit the core features and the NRBF kernel is employed to lessen the training time of SVM. At the same time, GA optimizes the parameters of SVM to avoid overfitting. The experimental findings reveal that the proposed approach shows better generalization performance and a higher accuracy rate than the SVM classifier. In the work of Elkhadir et al. [68], a comparative analysis of PCA and KPCA is performed on the KDDCUP99 dataset. Experiments reveal that when the number of NN is between one and four, a KPCA with a quadratic kernel offers a higher detection rate (DR) than a PCA, notably for identifying DOS and PROBE attacks.

a.iv

Classification-based

Unlike attack records, normal connection records are often readily available in real-world applications. As a result, one-class SVM trained on a dataset consisting of normal connection records can effectively differentiate between normal and varied attacks. Amer et al. [69] report that one-class SVM is susceptible to data outliers and suggest two modifications: robust one-class SVM and ETA one-class SVM. The central concept for robust one-class SVM is to minimize the mean square error (MSE) for dealing with outliers by utilizing the class mean as the averaged information. In contrast, ETA one-class SVM employs an outlier suppression mechanism. The proposed approach indicated that the enhanced one-class SVM performed better on two of four datasets than the existing UL AD methods.

Nguyen et al. [70] further improve the performance of one-class SVM by introducing the nested one-class SVM. Radial basis function (RBF) or a Gaussian kernel is commonly used to kernelize one-class SVM. However, selecting a Gaussian kernel hyperparameter based on the attack's availability is impractical. The authors attempted to solve the problem of kernel parameter optimization by proposing data-driven hyperparameter optimization. The presented methodology has achieved high detection accuracy and a lower false alarm rate (FAR) than the original one-class SVM in the experiments.

Verkerken [71] conducted a study on the generalization ability of UL algorithms in NAD. The four UL methods used for evaluation were PCA, IF, autoencoder (AE), and one-class SVM. The models were trained on one dataset and evaluated on another to assess their inter-dataset generalization. Experimental results showed that one-class SVM achieved the best AUROC scores, followed by the AE and PCA. However, when evaluated on different datasets, there was a significant drop in performance for all models, indicating a lack of generalization. The inter-dataset evaluation scheme highlighted the low generalization strength of the models. The study suggests that further research is needed to improve the adaptability and generalization of UL models for real-time detection in practical applications.

a.v

Hybrid

Zoppi et al. [26] conducted an extensive study investigating the relationship between attack families, anomaly classes, and detection methods. Their experiments on various UL algorithms, including ODIN, K-means, LOF, FastABOD, HBOS, and one-class SVM showed that FastABOD is effective for point anomalies, K-means performs well for collective anomalies, and HBOS is lightweight and efficient for limited computing resources. Additionally, the analysis of HBOS in Ref. [72] found that dynamic bins are particularly useful for identifying R2L and U2R attacks. NN-based approaches are commonly used and exhibit strong performance due to their unsupervised nature and outlier detection criteria. However, they may face challenges with high-dimensional data and incur computational complexity when dealing with large datasets. On the contrary, SVM maximizes the machine's accuracy concerning the training dataset while maximizing its capacity to correctly classify subsequent testing datasets, even on high-dimensional datasets. Therefore, combining NN-based approaches and SVM enhances classification performance with reasonable computational cost in high-dimensional data [73]. In contrast, SVMs maximize accuracy on training and testing data, including high-dimensional datasets, with one-class SVMs focusing on discovering decision boundaries with maximum distance from the origin [73].

Falcão et al. [4] compared various UL algorithms for NAD. Classification algorithms were found to exhibit the highest accuracy among the methods considered. Angle-based algorithms showed robustness in parameter selection. However, detecting anomalies with nonrepeatable and unstable behavior remained challenging, and higher detection scores were observed in datasets with rare anomaly events. In the literature, many proposed NAD schemes rely on traditional distance metrics such as Euclidean or Mahalanobis distance, which may not adequately capture the heterogeneous nature of network features.

To overcome this limitation, Aliakbarisani et al. [74] proposed a linear metric learning approach incorporating side information and UL techniques to extract similarity or dissimilarity constraints. Their approach formulates an optimization problem using the constraint trace ratio and utilizes Laplacian Eigenmap for preserving local neighborhood information. Experimental results on Kyoto 2006+ and NSL-KDD datasets demonstrated improved accuracy and reduced FPRs compared to Euclidean-based distance metrics.

Preprocessing steps are essential in NAD. One common technique is using LOF to remove outliers from the training set. LOF detects network anomalies by comparing them with new data and estimating the LOF value based on preprocessed normal datasets [75]. Additionally, Ding et al. [76] proposed hybridizing the similarity matrix scheme and LOF as preprocessing steps to identify outliers by considering data objects with lower similarity than others. This approach enhances outlier detection adaptability, especially for datasets with irregular shapes.

a.vi

With the rapid growth of the IoT, attackers are exploiting vulnerabilities in insecure devices like IP cameras to launch devastating Distributed Denial of Service (DdoS) attacks [77]. Traditional defense methods that rely on predefined features extracted from predefined flows are often ineffective in promptly detecting and blocking malicious flows. Hwang et al. [83] proposed a new scheme for automatically profiling traffic patterns directly from raw traffic to address this. By examining the initial packets and utilizing a convolutional neural network (CNN), the method generates features from raw traffic, enabling auto-learning. An unsupervised AE is then used to create profiles of benign traffic, effectively filtering out abnormal traffic. Experimental results showed a promising outcome with 100% accuracy and low false negative (0.02%) and false positive (0.83%) rates achieved by the proposed method.

Kabir and Luo [78] evaluated the performance of traditional and DL-based unsupervised methods for network flow anomaly detection. The study compared SOM, K-means, deep autoencoding GMM (DAGMM), and adversarially learned anomaly detection ALAD algorithms. Experimental results on the KDDCUP99 dataset showed that DAGMM achieved the lowest FPR (0.9%), while SOM had the highest accuracy (99.9%). SOM and k-means were found to be easy to implement with stable performance. ALAD performed well in detecting minority attacks due to generating adversarial samples.

In another study, Truong-Huu et al. [79] investigated the deployment of generative adversarial networks (GANs) for NAD. They utilized the UL scheme, which does not require prior label information during training. The concepts of traffic sessions and traffic aggregation techniques were introduced to extract useful traffic features and learn traffic behaviors. The experiments conducted on UNSW-NB15, CIC-IDS2017, and synthetic datasets demonstrated that GAN outperformed existing anomaly detection approaches.

Sovilj et al. [80] addressed the lack of descriptive analysis in intrusion detection models by introducing an attentional model analysis for DL models. This analysis provides explanations and predictions, aiding network operators in understanding potential threats. The authors conducted a comparative analysis of deep neural learning architectures for IDS in sequential data streams. Experimental results on synthetic and CIC-IDS2017 datasets showed that the RNN variant outperformed the nonsequential deep AE for anomaly detection. However, the performance of the attentional model could be further improved by incorporating attentional components to detect anomalous input data segments. Another study by Carrera et al. [81] addresses the critical need for detecting zero-day cyber-attacks, which exploit undisclosed vulnerabilities, posing substantial technical and economic risks to enterprises and technology-dependent systems. Through benchmarking three different approaches, such as deep autoencoding with GMM and IF, the proposed work demonstrates enhanced accuracy with IF. Furthermore, employing the SHAP methodology elucidates crucial features for attack classification, as validated through experiments conducted on diverse datasets, including KDD99, NSLKDD, and CIC-IDS2017. Sáez-de-Cámara et al. [82] address IoT security challenges by proposing a federated learning (FL) architecture for unsupervised network intrusion detection. Leveraging FL mitigates network overhead and data isolation issues, enhanced by an unsupervised device clustering algorithm. Evaluation on a diverse testbed showcases successful device grouping, although clustering quality may diminish with insufficient local training data. Notably, FL-based models exhibit faster convergence and low FPRs, outperforming ML-based alternatives like Kitsune, signaling promise for real-world deployment. However, challenges remain in adapting the approach to high-mobility IoT settings and addressing compromised devices in adversarial scenarios, warranting further exploration. The comparison analysis of UL approaches employed for network anomaly/intrusion detection is summarized in Table 2, providing an overview of the findings and insights regarding the performance and effectiveness of different methods.

Table 2:

Relative comparison of UL approaches for network anomaly/intrusion detection.

Authors	Methodology	Algorithm	Dataset	Input data	DoS	Probe	R2L	U2R	Real-time detection
[83]	Unsupervised approach for outlier detection using subspace clustering and evidence accumulation.	DBSCAN	METROSEC	Continuous	95	95	85	85	✓
[84]	Tree-based subspace clustering approach for high-dimensional datasets; includes cluster stability analysis.	Tree-based subspace clustering (TCLUS)	KDDCUP99, TUIDS	Mixed	99	96	86	66
[85]	Multiclustering scheme incorporating subspace clustering and evidence accumulation to overcome knowledge-based approach limitations.	DBSCAN	KDDCUP99, METROSEC	N/A	-	-	-	-	✓
[86]	Particle swarm optimization clustering strategy based on map-reduce methodology for parallelization in large-scale networks.	PSO clustering	KDDCUP99	Mixed	Max AUC: 96.3
[87]	Novel strategy for automatic tuning and optimization of detection model parameters in a real network environment.	Clustering and one-class SVM	KDDCUP99 and Kyoto University	Continuous	-	-	-	-	✓
[88]	K-means clustering to generate training subsets; neuro-fuzzy and radial-basis SVM for classification.	K-means, SVM, neuro-fuzzy neural network	KDDCUP99	N/A	98.8	97.31	97.5	97.5
[89]	Innate-immune strategy via UL for categorizing normal and abnormal profiles.	DBSCAN	KDDCUP99	N/A	FPR: 0.008, TNR: 0.991, ACC: 77.1, Recall: 0.589				✓
[49]	Novel approach using cluster centers and NNs to transform the dataset into a one-dimensional feature set.	Clustering, KNN	KDDCUP99	N/A	99.68	87.61	3.85	57.02
[90]	Nature-inspired meta-heuristic approach to optimize the optimum path forest clustering algorithm.	Optimum path forest clustering	ISCX, KDDCUP99, NSL-KDD	N/A	Purity measure: ISCX: 96.3, KDDCUP99: 71.66, NSL-KDD: 99.8
[91]	UL technique for real-time detection of fast and complex zero-day attacks.	DBSCAN	DARPA, ISCX	N/A	ACC: 98.39%, Recall: 100%, Precision: 98.12%, FP: 3.61				✓
[92]	Modified optimum path forest algorithm to enhance IDS performance, particularly in detecting less frequent attacks.	K-means, modified optimum path forest	NSL-KDD	N/A	96.89	85.92	77.98	81.13
[93]	Hierarchical agglomerative clustering applied to SOM network to lower computational cost and sensitivity.	Hierarchical agglomerative clustering, SOM	NSL-KDD	Mixed	DR: 96.66%, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033

ACC, accuracy; AUC, area under curve; DBSCAN, density-based spatial clustering of applications with noise; DoS, denial of service; DR, detection rate; FAR, false alarm rate; FNR, false negative rate; FPR, false positive rate; IDS, intrusion detection systems; KNN, K-nearest neighbor; NN, nearest neighbor; SOM, self-organizing map; SVM, support vector machine; TPR, true positive rate; UL, unsupervised learning.

Table 2 offers a comprehensive overview of various network anomaly and intrusion detection strategies. It delineates different methodologies and algorithms, ranging from subspace clustering to tree-based clustering and optimization techniques, all tailored to different datasets. The performance metrics exhibited by these approaches vary significantly, with some achieving commendable accuracy and DRs.

The table delineates these methodologies and offers a nuanced insight into their performance metrics. Some methods demonstrate impressive accuracy and DRs, showcasing their efficacy in identifying anomalous activities within network traffic. However, a notable observation is the need for more emphasis on real-time detection capability across these methodologies. Real-time detection is imperative for promptly identifying and mitigating security threats, yet not all approaches prioritize this aspect adequately.

Recent advances in NAD using UL

This section delves into the most recent developments in anomaly-based network intrusion detection systems (NIDS) that play a critical role in protecting networks against ever-evolving cyber threats. Table 3 provides a comprehensive overview of recent research efforts to improve network security. It meticulously documents the contributions of various studies, highlighting innovative methodologies, algorithms, and techniques employed in modern IDSs.

Table 3:

Recent developments in unsupervised-based network intrusion/anomaly detection.

Authors/year	Methodology	Algorithm/techniques used	Datasets	Attack class	Metrics (%)	Limitations/future work
Amoli et al. (2015) [91]	Real-time intrusion detection using adaptive thresholds.	DBSCAN	DARPA, ISCX	DoS, DdoS, POD, SMURF, Mail-bomb, botnet, port scanning-port sweep	Precision: 98.12, ACC: 98.39, FPR: 3.61	Future work: Detecting complex attacks, distinguishing flash crowds and DdoS for better clarity.
Zhang et al. (2016) [42]	Utilizing One-class SVM for network intrusion identification.	One-class SVM	NSL-KDD	DoS, Probe, U2R, R2L	Precision: 99.3, Recall: 91.61, F-value: 95.18	Low DR for minority-class attacks like U2R and R2L.
Landress (2016) [61]	Employing K-means clustering, J48, decision tree, and SOM to reduce false positives.	K-means, J48, decision tree, self-organizing map	KDDCUP99	DoS, Probe, U2R, R2L	ACC: 98.92	Limitation: Increased computational complexity with SOM; explore efficient hybrid techniques for real-time processing.
He et al. (2017) [94]	Exploiting SDN for effective anomaly detection.	DBSCAN	KDDCUP99	DoS, Probe, U2R, R2L	ACC: 94.5	Future work: Focus on real-time packet clustering for timely detection.
Ariafar and Kiani (2017) [95]	Optimizing clusters using GA, K-means, and decision trees.	K-means, decision tree, GA	NSL-KDD	DoS, Probe, U2R, R2L	DR: 99.1, FAR: 1.8	Evaluate on modern-day datasets like CIC-IDS 201 and CSE-CICIDS2018.
Bigdeli et al. (2018) [96]	Incremental cluster updates with spectral and density-based clustering.	Spectral and density-based clustering	KDDCUP99, NSL-KDD, Darpa98, IUSTsip, DataSetMe	DoS, Probe, U2R, R2L	DR: 94%, FAR: 4%	Introduce concept drift while merging clusters.
Almi'Ani et al. (2018) [97]	Hierarchical agglomerative clustering using k-means for reduced training time.	Hierarchical agglomerative clustering	NSL-KDD	DoS, Probe, U2R, R2L	DR: 96.66, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033	Low accuracy due to low sensitivity toward normal behavior; investigate fuzzy C-means clustering for accuracy enhancement.
Zhou et al. (2019) [98]	Hybrid technique using KPCA and ELM.	KPCA, ELM	KDDCUP99	DoS, Probe, U2R, R2L	DR: DoS: 98.96, Probe: 98.54, R2L: 94.72, U2R: 36.54, Acc: 98.18, FAR: 2.38	The DR of U2R is quite low. To optimize the results of ELM, evolutionary algorithms need to be applied.
Choi et al. (2019) [99]	Extracting key features using AE.	AE	NSL-KDD	DoS, Probe, U2R, R2L	ACC: 91.70	Improve U2R DR; apply evolutionary algorithms to optimize ELM results.
Aliakbarisani et al. (2019) [74]	Formulates a constraint trace ratio optimization problem for Laplacian Eigenmap strategy.	Constraint trace optimization, PCA, LDA	NSL-KDD, Kyoto 2006+	DoS, Probe, U2R, R2L	ACC: 97.84, F-score: 0.878, FPR: 0.001	Train and test with different datasets; explore ensemble strategies based on UL models for enhanced performance.
Paulauskas and Baskys (2019) [72]	Employing HBOS mechanism for identifying rare attacks like U2R and R2L.	HBOS	NSL-KDD	DoS, Probe, U2R, R2L	F-measure: 87	Investigate online learning metrics for newly discovered attacks.
Hwang et al. (2020) [100]	CNN and AE for extracting raw features from network traffic.	CNN, AE	USTC-TFC 2016, Mirai-RGU, Mirai-CCU	SYN flood, UDP flood, ACK flood, and HTTP flood, Mirai C&C	ACC: 99.77, Precision: 99.93, Recall: 99.17, F1 measure: 99.55, FNR: 0.02, FPR: 0.83	Improve performance rate of majority classes.
Zavrak and Iskefiyeli (2020) [101]	Deploys flow-based IDS with One-class SVM as anomaly detector.	AE, VAE, One-Class SVM	CIC-IDS2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	AUC: VAE: 75.96, AE: 73.98, One-Class SVM: 66.36	Consider flow-based attributes collected at specified time intervals to improve DR and reduce FAR.
Truong-Huu et al. (2020) [79]	Uses GAN strategy to extract useful features and proposes a traffic aggregation technique.	GAN	UNSW-NB15, CIC-IDS2017	Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	UNSW-NB15: Precision: 0.84, Recall: 0.85, F1-score: 0.85, AUPRC: 0.8831 CIC-IDS2017: Prec: 0.8260, Recall: 0.8268, F1-score: 0.8264, AUROC: 0.9529, AUPRC: 0.8271	Investigate multi-class classification approach for identifying different types of attacks.
Prasad et al. (2020) [56]	Proposes a novel cluster center initialization approach to overcome shortcomings of conventional clustering techniques.	K-means clustering, cluster center initialization	CIC-IDS2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	DR: 88, FR: 88.5, Precision: 88, F-measure: 0.531, ACC: 88.6	Address limitations like manual pre-processing and time/ space complexity in MANET deployment.
Megantara and Ahmad (2021) [102]	Utilizes feature importance and data-reduction techniques to improve the prediction performance of NAD.	Decision tree, LOF	NSL-KDD, UNSW-NB15	DoS, Probe, U2R, R2L Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, worms	NSL-KDD: Acc: 99.73 UNSW-NB15: 91.86	The size of the LOF cluster can be optimized and the threshold value for handling outliers can be improved.
Liao et al. (2021) [103]	Presents an ensemble of UL schemes based on a novel weighting scheme.	AE, GANs	UNSW-NB15, CIC-IDS 2017	Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed	UNSW-NB15: Precision: 97.9, Recall: 92.4, F1-score: 94.9 CIC-IDS2017: Precision: 83.8, Recall: 84, F1-score: 83.5	Address suboptimal results for specific attack families, explore optimization for better performance.
Verkerken et al. (2022) [71]	Proposes an inter-dataset evaluation approach for ensemble UL algorithms.	PCA, IF, AE, One-class SVM	CIC-IDS 2017, CSE-IDS 2017	DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heartbleed	ACC: PCA: 90.9, IF: 91.1, AE: 99.9, One-class SVM: 98.9 AUROC: PCA: 93.73, IF: 95.8, AE: 97.75, One-class SVM: 94.20	Consider employing supervised learning approaches for generalization strength validation.
Singh and Jang-Jaccard (2022) [104]	Proposes a unified AE architecture based on CNN and LSTM to examine spatiotemporal correlations in traffic data.	MSCNN, LSTM-based AE	NSL-KDD, UNSW-NB15, CICDDoS2019	Normal, Attack (Binary class)	ACC: 97.10, Precision: 95.9, Recall: 96.4, F-score: 96.0 (average)	Fine-tune hyperparameters for enhanced model performance using optimization techniques.
Wang et al. (2022) [105]	Proposes an ensemble of UL algorithms to address challenges in processing overhead and poor detection performance for unseen threats.	AE, IF	CES-CICIDS2018, MQTT-IOT-IDS2020	Sparta SSH brute force, DoS attacks-SlowHTTPTest, DoS-Hulk, DdoS attack-LOIC-UDP, DdoS attacks-LOIC-HTTP, Brute force-XSS, Brute force-web, FTP-brute force	Average ACC: 96.43, Average Recall: 95.95, Average F-score: 96.02	Improve DR of specific attacks like “DoS-Hulk” in the proposed work.
De C. Bertoli et al. (2022) [106]	Presents an NIDS for generalized detection performance in heterogeneous networks.	Deep AE, FL, Energy Flow Classifier	UNSW-NB15, CSE-CICIDS2018, BoT-IoT, ToN-IoT	Normal, Attack (Binary class)	BoT-IoT: ACC: 93, Recall: 93, Precision: 99, F-score: 95 ToN-IoT: ACC: 85, Recall: 85, Precision: 87, F-score: 77, UNSWNB15: ACC: 97, Recall: 99, Precision: 57, F-score: 73 CSE-CICIDS2018: ACC: 98, Recall: 88, Precision: 92, F-score: 90	Deploy the proposed approach in distributed NIDS for robustness evaluation.
Eren et al. (2023) [107]	Proposes a novel tensor decomposition method to enhance the detection of unseen attacks in network anomaly and intrusion detection, leveraging tensor factorization for improved generalization and identification of evolving threats.	Unsupervised DL, Tensor factorization algorithm	Neris-botnet, Spam e-mail	-	ROC-AUC, PR-AUC Average ROC-AUC: 0.9661, Average PR-AUC: 0.9152	Using a tensor decomposition algorithm with latent factors enables enhanced initialization of the tensor factorization algorithm for improved performance.
Lan et al. (2023) [108]	Presents a novel framework for unsupervised intrusion detection that transfers knowledge from known attacks to detect new ones using a hierarchical attention-based triplet network and unsupervised domain adaptation.	Attention network, unsupervised domain adaption	ISCX-2012, UNSW-NB15, CIC-IDS2017, CTU-13	HTTP DoS, Brute Force SSH, PortScan, Brute Force, Dos slow loris, Web attack XSS, Neris, Rbot, Fuzzers, Generic, Shellcode, Exploits	Average ACC: 96.38, Average DR: 94.16	Utilize tensor decomposition algorithm with latent factors for enhanced initialization and performance.
Boppana and Bagade (2023) [109]	Synergy of GANs and AE for unsupervised intrusion detection in MQTT networks.	AE, GAN	MQTT-IoT-IDS2020	Normal, Attack (Binary class)	F1-score: 97	Evaluate generalizability to diverse network architectures in future work.

ACC, Accuracy; AE, autoencoders; AUC, area under curve; CNN, convolutional neural network; DBSCAN, density-based spatial clustering of applications with noise; DdoS, distributed denial of service; DL, deep learning; DoS, denial of service; DR, detection rate; ELM, extreme learning machine; FAR, false alarm rate; FL, federated learning; FNR, false negative rate; FPR, false positive rate; GA, genetic algorithm; GAN, generative adversarial network; HBOS, histogram-based outlier score; IDS, intrusion detection systems; IF, isolation forest; IoT, internet of things; KPCA, kernel principal component analysis; LOF, local outlier factor; LSTM, long short-term memory; MSCNN, multi-scale convolution neural network; NAD, network anomaly detection; NIDS, network intrusion detection system; PCA, principal component analysis; PR-AUC, precision-recall curve; ROC, receiver operating characteristic; ROC-AUC, ROC curve; SOM, self-organizing map; SVM, support vector machine; TPR, True positive rate; UL, unsupervised learning; VAE, variational autoencoders.

Table 3 provides an overview of recent developments in unsupervised network intrusion/anomaly detection. Various techniques and algorithms, such as DBSCAN, one-class SVM, K-means clustering, AE, GANs, and tensor factorization, among others, have been discussed. The studies have utilized different datasets and evaluated the performance based on validation metrics such as accuracy, precision, recall, and F-score. In terms of recent developments, some notable advancements include the utilization of hybrid techniques to reduce FPRs, the application of evolutionary algorithms to optimize detection results, the use of DL models such as CNN and long short-term memory (LSTM) for spatiotemporal correlation analysis, and the exploration of ensemble approaches to improve detection performance.

Moreover, researchers have delved into ensemble approaches to bolster detection performance and have incorporated feature importance and data-reduction techniques to enhance model efficiency. The development of inter-dataset evaluation methods has further facilitated robust performance assessment across diverse datasets. However, there are still some limitations. These include improving DRs for minority-class attacks, investigating real-time packet clustering, and addressing the computational complexity of certain approaches. Additionally, ongoing research is aimed at exploring multi-class classification techniques to differentiate between various attack types and enhance the generalizability of models across different datasets and network architectures.

IV.

Experimental Analyses

In this section, an experimental evaluation of 13 selected UL algorithms from a pool of 6 families is conducted to analyze the performance of UL-based AD methods on network datasets.

Selection of the datasets

Researchers have created synthetic datasets used as benchmarks to examine new ML-based intrusion detection models and approaches. Synthetic datasets are generated in a limited, controlled environment and include benign and malicious traffic. These datasets have distinct features, usually presented in a flowchart-like structure. Some benchmark datasets include DARPA [110] (1998), KDD'99 [111] (1998), DEFCON (2000) [112], LBNL [113] (2004), Kyoto [114] (2006), NSL-KDD [115] (2009), CDX [116] (2009), ISCX 2012 [117] (2012), and CAIDA [118] (2013–17). Recently, datasets like UNSW-NB15 [119] (2015), CIDDS-001 [120] (2017), CIDDS-002 [120] (2017), CICIDS 2017 [121] (2017), CSE-CICIDS2018 [122] (2018), CICDDoS [123] 2019 (2019), BoT-IoT [124] (2019), and [125] (2020) have emerged as pivotal contributors. However, it is essential to discern their specific characteristics, particularly when comparing the most recent ones. While recent datasets like CICDoS2019, BoT-IoT, and IoT-23 focus on specific DDOS and IoT attacks, CIC-IDS2017 and CSE-CICIDS2018 cover a broader range of attacks.

To develop and evaluate intrusion detection frameworks, critical dataset characteristics include a diversity of attacks, anonymity, available protocols, complete network traffic capture, complete network interaction capture, defined network configuration, feature set, labeled data samples, heterogeneity, and metadata. CIC-IDS-2017 and CSE-CIC-IDS-2018 align with these characteristics and use profiles to generate well-ordered datasets. These datasets offer comprehensive insights into attacks, application models, network devices, and protocols. Although both datasets align with essential characteristics, CSE-CICIDS2018 is larger in size and presents challenges in processing due to the sheer volume of data instances in each file.

Amidst these considerations, the selection of the CIC-IDS-2017 dataset for evaluating the UL algorithms becomes apparent. Despite KDD99 and NSL-KDD being commonly used, their outdated content raises concerns. NSL-KDD, while addressing some KDD99 issues, may represent something other than real networks perfectly. Nevertheless, its reasonable number of records facilitates comprehensive experiments. Moreover, UNSW-NB15 is popular and widely employed due to its inclusion of diverse novel attacks. In this paper, we selected NSL-KDD, UNSW-NB15, and CIC-IDS2017 to evaluate UL algorithms based on their popularity, extensive usage, and consistent results. Table 4 provides a comparative analysis of the selected three benchmark datasets with various other datasets.

Table 4:

Comparative analysis of network datasets.

Dataset	Data-source	Year	1	2	3	4	5	6	7
DARPA (1998)	MIT Lincoln Laboratory	1998	Emulated	Benchmark	✓	7,000,000	✓	DoS, Probe, U2R, R2L	41
KDD CUP 99	University of California	1999	Emulated	Benchmark	✓	4,900,000	✓	DoS, Probe, U2R, R2L	41
DEFCON	N/A	2000	Real	Benchmark	N/A	N/A	✓	N/A	Flag traces
LBNL	Lawrence Berkeley National Laboratory	2004	N/A	Benchmark	✓	>100 hr	×	Malicious traces	Internet traces
Kyoto	Song et al. (2006) [114]	2006	Real	Benchmark	✓	N/A	✓	Normal and attack sessions	24
CAIDA	Jonker et al. (2017) [118]	2008–2017	Real	Benchmark	✓	Huge	×	DDoS	20
CDX	United States Military Academy	2009	Real	Real-life	N/A	5771	✓	Buffer overflow	5
NSL-KDD	Tavallaee et al. (2009) [126]	2009	Emulated	Benchmark	✓	148,517	✓	DoS, Probe, U2R, R2L	41
ISCX 2012	Shiravi et al. (2012) [117]	2012	Real	Real-life	✓	2,450,324	✓	DoS, DdoS, Bruteforce, Infiltration	IP flows
UNSW-NB15	Moustafa and Slay (2015) [119]	2015	Emulated	Benchmark	✓	2,540,044	✓	Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, Worms	49
CIDDS-001	Ring et al. (2017) [120]	2017	Emulated	Benchmark	✓	31,959,267	✓	Ping scanning, Port scanning, Brute force, and DoS	14
CIDDS-002	Ring et al. (2017) [120]	2017	Emulated	Benchmark	✓	16,161,183	✓	Ping scanning, Port scanning, Brute force, and DoS	14
CIC-IDS2017	Sharafaldin et al. (2018) [121]	2017	Emulated	Benchmark	✓	2,830,743	✓	DdoS, Dos, Botnet, BruteForce, Infiltration, WebAttack, Port scan	80
CSE-CICIDS2018	Bharati and Tamane (2020) [122]	2018	Emulated	Benchmark	✓	16,232,943	✓	DdoS, Dos Botnet, BruteForce, Infiltration, WebAttack, Port scan	80
CICDDoS 2019	Sharafaldin et al. (2019) [123]	2019	Emulated	Benchmark	✓	32,925	✓	DdoS_DNS, DdoS_LDAP, DdoS_MSSQL, DdoS_NetBIOS, DdoS_NTP, DdoS_ SNMP, DdoS_SSDP, DdoS_SYN, DdoS_ TFTP, DdoS_UDP, DdoS_UDP-Lag, DdoS_WebDDoS	76
BoT-IoT	Koroniotis et al. (2019) [124]	2019	Real	Real-life	✓	73,360,900	✓	DoS, DdoS, Reconnaissance, Theft	29
IoT-23	N/A	2020	Real	Real-life	✓	N/A	✓	Mirai, Torii, Hide and Seek, Muhstik, Hakai, IRCBot, Hajime, Trojan, Kenjiro, Okiru, Gagfyt	21

DdoS, distributed denial of service; DoS, denial of service; IDS, intrusion detection systems; IoT, internet of things; IRCBot, internet relay chat botnet.

Note: 1: Authenticity; 2: Type; 3: Publicly available; 4: No. of instances; 5: Labeled; 6: Attack type7: Features.

The following considerations underpin the data-set-selection process:

NSL-KDD:

Upgraded variant of KDD99.

Designed to reduce classifier bias.

Unbiased classification and consistent results.

No duplicate records, ensuring balance across attack groups.

UNSW-NB15:

Developed in response to evolving network attack behavior.

Contemporary substitute for NSL-KDD.

Proximity in generation processes and functionality.

Addresses shortcomings of KDD-99, enhancing applicability in modern NIDS.

CIC-IDS2017:

Meticulously crafted with a focus on essential dataset characteristics.

Well-ordered format using profiles.

In-depth understanding of attacks, application models, network devices, and protocols.

Alignment with critical features, making it a comprehensive resource for evaluation.

Table 4 presents a comprehensive overview of various datasets used for intrusion detection, delineating crucial aspects such as data source, year, emulation status, benchmark or real-life nature, dataset size, attack classes covered, and associated publications. NSL-KDD and UNSW-NB15 datasets, both emulated benchmarks, serve as valuable resources for assessing IDS. NSL-KDD, established in 2009, comprises over 148,000 instances and covers a spectrum of attack classes, including DoS, Probe, U2R, and R2L. Meanwhile, UNSW-NB15, introduced in 2015, boasts a larger dataset size exceeding 2.5 million instances and encompasses a broader array of attack types, rendering it suitable for evaluating modern intrusion detection techniques.

However, among these datasets, CIC-IDS 2017 emerges as a standout choice. Despite being emulated, it offers an extensive and realistic portrayal of contemporary cybersecurity threats. With over 2.8 million instances, it spans various sophisticated attacks, including DdoS, Dos, Botnet, BruteForce, Infiltration, Web Attack, and Port Scan. This diversity is crucial for evaluating the efficacy of IDS in real-world scenarios. Contrasting with older datasets like DARPA (1998) and KDD CUP 99, it is evident that the field has advanced significantly. While DARPA and KDD datasets were pioneering benchmarks, they need more scalability, diversity, and realism offered by more recent datasets like CIC-IDS 2017. In contrast with datasets like ISCX 2012 and CAIDA, CIC-IDS 2017 provides more extensive coverage of attack types and larger dataset sizes, making it better suited for evaluating the robustness and generalizability of IDS.

The newer datasets, such as CSE-CICIDS2018, maintain relevance in contemporary intrusion detection research. Notably, CSE-CICIDS2018 boasts a larger dataset size than its predecessors, presenting unique challenges in data processing owing to the sheer volume of data instances in each file. On the other hand, recent datasets like CICDoS2019, BoTIoT, and IoT-23 focus on specific types of attacks prevalent in modern cyber threats, such as DDOS and IoT-based attacks. Contrastingly, CIC-IDS2017 offers a broader spectrum of attack scenarios, covering a wide range of intrusion types beyond just DDOS and IoT attacks. CIC-IDS 2017 stands out for its comprehensive coverage, encompassing a multitude of attack scenarios ranging from network reconnaissance to web-based attacks.

Selection of metrics

Performance metrics are used to compare the effectiveness of NAD systems and to evaluate the experimental findings of various approaches. Performance metrics are obtained from the confusion matrix, which aims to categorize normal and abnormal data. The terms TP (true positive) and TN (true negative) refer to accurately predicted conditions, whereas FP (false positive) and FN (false negative) refer to incorrectly classified conditions. TPs and TNs relate to correctly categorized attacks and normal records, while FPs and FNs refer to incorrectly classified attacks and normal records, respectively [127, 128, 129]. The following metrics are selected based on those frequently applied in survey studies.

Table 5:

Hardware and software specifications.

Components	Specifications
Operating system	Windows 11
System type	64-bit operating system, x 64-based Processor
Processor	11^th Gen Intel ® Core (TM)
	i5-1135G7 @ 2.40 GHz
	2.42 GHz
RAM	8 GB
Python	3.7
Datasets	NSL-KDD, UNSW-NB15, CICIDS2017

1. Accuracy: It is defined as the ratio of correctly classified instances (TP + TN) to the total number of instances present in the dataset. $Accuracy = \frac{(TP + TN)}{(TP + TN + FP + FN)}$ Accuracy = {{(TP + TN)} \over {(TP + TN + FP + FN)}}

2. Precision: It is defined as the ratio of correctly identified instances (TP) divided by the total number of instances which are classified as correct. $Precison = \frac{(TP)}{(TP + FP)}$ Precison = {{(TP)} \over {(TP + FP)}}

3. Recall: It is calculated as the ratio of correctly identified positive samples (TP) to the total number of positive samples. $Recall = \frac{(TP)}{(TP + FN)}$ Recall = {{(TP)} \over {(TP + FN)}}

4. F1-score: It is the harmonic mean of the model's precision and recall, providing information on both aspects of its performance. Thus, a high F1 score shows a high recall and precision value. $F - score = 2 \times (\frac{Precision \times Recall}{Precision + Recall})$ F - score = 2 \times \left( {{{Precision \times Recall} \over {Precision + Recall}}}\right)

Results and discussion

The experiments were conducted using the hardware and software specifications given in Table 5. Three datasets are processed by transforming nominal variables to numerical, when necessary, to improve the number of usable features without altering the dataset's semantics. Parameter tuning is utilized to determine the optimal configuration for an algorithm. The tuning process is accomplished by carrying out several different parameter combinations. The results of different parameter combinations are compared to one another to select the best possible combination.

This study extensively evaluates 13 UL methods on three benchmark intrusion detection datasets: NSL-KDD, UNSW-NB15, and CIC-IDS2017. The goal is to assess the performance of these algorithms and establish their effectiveness compared to existing research. These UL algorithms have been implemented for AD, as referenced in [130, 131, 132, 133, 134], and tested on both traditional and modern datasets to ensure a thorough analysis. Additionally, unsupervised dimensionality-reduction techniques from the six families of AD techniques are employed to tackle the challenge of high-dimensional real-world data. This allows the representation of data in reduced dimensions.

A decision tree classifier is utilized for result prediction to gauge the performance of dimensionality reduction approaches. For detailed insights into the results, Tables 6–8 present the performance of the 13 selected UL methods for NAD on the three benchmark datasets. These findings provide valuable evidence for the efficacy of these algorithms in AD.

The performance results presented in Tables 6–8 highlight the notable performance of the selected UL methods, particularly on the NSL-KDD dataset. The lack of redundant connections and even data distribution across attack categories and normal connections in the NSL-KDD dataset contribute significantly to these results. As shown in Figures 6–8, the study indicates that ODIN outperforms KNN in detecting anomalies by utilizing an “indegree score” based on the KNN graph. DBSCAN was more effective than K-means and EM in clustering-based techniques. The study observed that nearest-neighbor-based approaches usually outperformed clustering methods, but clustering-based approaches were more computationally efficient for processing large datasets in near real-time environments.

Comparison of UL methods on the NSL-KDD dataset. UL, unsupervised learning.

Comparison of UL methods on the UNSW-NB15 dataset. UL, unsupervised learning.

Comparison of UL methods on the CIC-IDS2017 dataset. IDS, intrusion detection systems; UL, unsupervised learning.

Table 6:

Results of 13 selected UL methods for NAD on the NSL-KDD dataset.

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Neighbor-based	98.4	98.0	97.9	98.5
ODIN	Neighbor-based	99.4	99.0	98.6	98.8
LOF	Density-based	97.0	96.8	95.3	95.6
COF	Density-based	94.0	94.5	96.1	95.1
K-means	Clustering-based	94	92.46	92.3	93
DBSCAN	Clustering-based	94.5	94.0	92.9	94.0
EM	Clustering-based	95.0	93	92.9	94.0
PCA	Dimensionality-reduction	97.8	97.0	96.7	97.0
KPCA	Dimensionality-reduction	99.2	99.0	98.9	98.6
ICA	Dimensionality-reduction	98.0	97.5	97.8	98.1
HBOS	Statistical-based	94.4	93.9	95.8	96.0
One-class SVM	Classification-based	99.4	99.2	99.2	99.5
IF	Classification-based	99.9	99.8	99.6	99.7

COF, connectivity-based outlier factor; DBSCAN, density-based spatial clustering of applications with noise; EM, expectation maximization; HBOS, histogram-based outlier score; ICA, independent component analysis; IF, isolation forest; KNN, K-nearest neighbor; KPCA, kernel principal component analysis; LOF, local outlier factor; NAD, network anomaly detection; ODIN, outlier detection using indegree number; PCA, principal component analysis; SVM, support vector machine; UL, unsupervised learning.

Table 7:

Results of 13 selected UL methods for NAD on the UNSW-NB15 dataset.

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Distance-based	98.2	97.8	97.6	97.7
ODIN	Distance-based	99.2	98.9	98.5	98.6
LOF	Density-based	96.9	95.8	95.1	95.0
COF	Density-based	93.8	94.2	95.9	94.7
K-means	Clustering-based	93.7	92.0	92.1	92.9
DBSCAN	Clustering-based	94.2	93.8	92.7	93.5
EM	Clustering-based	95.5	92.6	92.3	93.4
PCA	Dimensionality-reduction	97.2	96.9	97.3	97.1
KPCA	Dimensionality-reduction	99.1	98.5	98.5	98.7
ICA	Dimensionality-reduction	97.7	97.4	97.1	97.3
HBOS	Statistical-based	92.3	91.9	94.6	94.8
One-class SVM	Classification-based	99.2	99.0	99.2	99.4
IF	Classification-based	99.7	99.5	99.3	99.6

Table 8:

Results of 13 selected UL methods for NAD on the CIC-IDS2017 dataset.

Algorithms	Family	Accuracy	Precision	Recall	F1-score
K-NN	Distance-based	96.4	96.7	96.0	97.0
ODIN	Distance-based	97.6	97.4	97.5	98.0
LOF	Density-based	96	95.3	93.6	94.8
COF	Density-based	93.3	93.2	95	93.7
K-means	Clustering-based	90.2	87.4	88.6	89.5
DBSCAN	Clustering-based	93.7	92.7	90.6	92.3
EM	Clustering-based	94.3	93.2	91.2	92.4
PCA	Dimensionality-reduction	96.3	95.4	94.9	95.1
KPCA	Dimensionality-reduction	97.3	97.2	96.9	97
ICA	Dimensionality-reduction	96.3	96.2	96	96.3
HBOS	Statistical-based	95.9	94.9	93.5	93.0
One-class SVM	Classification-based	98.3	98.1	99	98.4
IF	Classification-based	99.6	99.5	99.2	99.4

COF, connectivity-based outlier factor; DBSCAN, density-based spatial clustering of applications with noise; EM, expectation maximization; HBOS, histogram-based outlier score; ICA, independent component analysis; IDS, intrusion detection systems; IF, isolation forest; KNN, K-nearest neighbor; KPCA, kernel principal component analysis; LOF, local outlier factor; NAD, network anomaly detection; ODIN, outlier detection using indegree number; PCA, principal component analysis; SVM, support vector machine; UL, unsupervised learning.

In density-based approaches, LOF surpasses COF, especially in scenarios requiring the detection of local anomalies. However, COF was recommended when additional information about the nature of anomalies in the evaluated dataset was unavailable. Dimensionality-reduction techniques, particularly KPCA and ICA, outperformed PCA across all data-sets, with KPCA yielding the best results. This superiority can be attributed to the ability of KPCA and ICA to capture higher-order information in the original inputs, unlike PCA. HBOS, a statistical method assuming feature independence, demonstrates prominent results and is well suited for application in large datasets due to its fast computation time. It serves as a valuable option for AD in such scenarios.

The study also observed that algorithms belonging to the classification family consistently outperformed algorithms from other families on all datasets. IF stood out as the top-performing algorithm, achieving impressive average scores of 99.9, 99.7, and 99.6 on the NSL-KDD, UNSW-NB15, and CIC-IDS2017 datasets. The findings of this study have practical implications for real-world NAD systems. IF is recommended for critical infrastructure networks, while ODIN is suitable for healthcare systems or financial institutions where minimizing false negatives is critical. Clustering-based methods such as DBSCAN and density-based techniques like LOF can be employed in network monitoring systems to identify unusual patterns or behaviors that may indicate security breaches or system malfunctions. Finally, dimensionality-reduction techniques like KPCA and ICA can reduce computational overhead and facilitate real-time anomaly detection in dynamic network environments.

By understanding the practical applications of these UL methods, researchers can tailor their selection and implementation strategies to address specific security challenges and operational requirements. The study's findings reveal the superior performance of ODIN, DBSCAN, LOF, KPCA, ICA, and IF. They can be employed to develop more effective NAD systems for various industries, preventing potential cyber threats, enhancing network security, and improving operational efficiency.

Open Challenges and Future Directions

This section discusses various unexplored research areas and significant prospects for using UL in network intrusion/anomaly detection.

Improving SOM algorithm: The SOM algorithm faces challenges related to high time complexity, prompting an amalgamation of methods like k-means [135] to enhance computational efficiency. However, various approaches can still be explored to optimize performance in terms of speed and precision further [136, 137]. Beyond the overarching concerns of time complexity, the algorithm encounters heightened FPRs when confronted with symmetric data. This noteworthy observation prompts a comprehensive exploration of potential solutions. A forward-looking strategy involves the integration of hybrid technologies, notably those enriched with topographic measures [138]. These measures are envisioned to instill refined growing rules, laying the groundwork for more nuanced and accurate processing, especially in symmetric datasets [139].

Inter-dataset evaluation: Ensemble techniques that use UL methods are widely used in NAD to improve system performance. However, it is a challenge to assess the ability of NAD models to perform well across different datasets when relying only on individual assessments. Therefore, it is essential to explore inter-dataset evaluation strategies to determine the true generalization ability of unsupervised methods and enhance adaptability for future developments. To achieve this, a thorough investigation is required to align methodologies with the ever-changing landscape of diverse datasets and ensure the development of resilient and adaptable NAD systems.

Explanation facility: DL models are great at identifying network anomalies, but they lack the ability to provide clear explanations to end-users. Although attentional models offer insights, there is room for improvement in result comprehension and explanatory clarity [80]. To address this, forthcoming research should delve into unexplored territories by investigating various attentional components. Moreover, adopting the backward pass-through-the-network approach, well-known in image processing [140, 141], could effectively scrutinize backward techniques and reveal the complex relationship between input features and the final reconstruction values. This exploration has the potential to yield greatly improved results, providing more coherent and meaningful explanations.

Self-taught learning (STL): Applying STL with UL is a potent strategy for enhancing intrusion and anomaly detection in network security. This method strategically addresses the challenge of limited labeled data by leveraging STL's adaptive learning capabilities. STL autonomously discerns intricate patterns and anomalies within the network, fostering real-time anomaly recognition and prompt threat response. The model's dynamic evolution ensures continuous adaptation to the evolving threat landscape, enhancing its effectiveness. The fusion of STL and UL is a promising framework for advancing network security and provides a learning-centric approach with significant potential for future research. This approach can strengthen network defense mechanisms against evolving threats.

Interpretability problems with some unsupervised ML algorithms: Addressing interpretability challenges in unsupervised ML algorithms, particularly deep neural networks, is a pivotal research focus. While contributing to high accuracy, the inherent complexity of these algorithms often hinders their interpretability, posing challenges for comprehension and decision-making in operational networks [75]. To tackle this issue, ongoing efforts by researchers aim to enhance the transparency and explainability of neural network models [142]. Future research in this domain should delve into developing methodologies and frameworks that augment the interpretability of artificial intelligence and ML models, ensuring their effective integration into operational networks [141].

Lack of functional success in intrusion detection: Exploring the practical application of UL in network intrusion detection reveals noteworthy challenges hindering its functional success in operational contexts. Despite substantial academic research and practical implementations in diverse fields, adopting ML solutions, specifically UL, for network intrusion detection still needs to be improved in real-world scenarios [143]. This constraint is attributed to various factors: (a) insufficient training data availability, (b) a semantic gap between the obtained results and their operational interpretation, (c) the substantial costs associated with errors in the detection process, and (d) inherent challenges in conducting accurate performance evaluations. Addressing these challenges is crucial for realizing the full potential of UL in network security and establishing its efficacy in operational environments.

VI.

Conclusion

This paper provides an in-depth analysis of UL anomaly detection (AD) techniques, highlighting recent developments while identifying key research limitations. The study demonstrates the effectiveness of these approaches in real-world network security scenarios by evaluating 13 UL methods across benchmark intrusion detection datasets. IFs, ODIN, DBSCAN, LOF, KPCA and ICA are some of the methods evaluated, which have versatile applications in anomaly detection in various sectors such as critical infrastructure networks, healthcare, finance, and network monitoring.

The paper suggests several recommendations to guide future research endeavors for advancements of UL methods. One of the recommendations is to address the interpretability challenges inherent in DL algorithms, particularly in the context of NAD, by integrating explainable AI techniques such as attention mechanisms or back-propagation methods into UL models. This integration would enhance transparency and facilitate informed decision-making in security operations. Another recommendation is to explore ensemble techniques for inter-dataset evaluation, which hold promise for assessing UL methods' generalization capabilities across diverse datasets. In addition, the paper suggests the fusion of self-taught learning with UL as a compelling avenue for enhancing anomaly detection capabilities in network security. Leveraging STL's adaptive learning mechanisms can augment UL models' ability to discern complex patterns and anomalies, bolstering real-time threat detection and response capabilities.

The paper suggests strategies like data augmentation, transfer learning, and feature engineering to address practical constraints hindering UL's functional success, such as data scarcity and semantic gaps between model outputs and operational interpretations. Furthermore, the integration of model explainability techniques such as feature importance analysis, SHAP, LIME, and Integrated Gradients, alongside human-in-the-loop approaches involving expert feedback and domain knowledge integration, can enhance interpretability and align UL models more closely with operational requirements, thereby fostering the development of more robust and actionable anomaly detection solutions. By heeding these recommendations, researchers can navigate today's challenges more effectively and pave the way for advancements in UL-based NAD.

eISSN:: 1178-5608
Language:: English

Publication timeframe:: Volume Open
Journal Subjects:: Engineering, Introductions and Overviews, other

Journal RSS Feed

A Holistic review and performance evaluation of unsupervised learning methods for network anomaly detection

Article Category: Article

Published Online: May 19, 2024

Page range: -

Received: Nov 24, 2023

DOI: https://doi.org/10.2478/ijssis-2024-0016

Keywords
unsupervised learning, machine learning, unsupervised anomaly detection techniques, network anomaly detection, intrusion detection

© 2024 Niharika Sharma et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

A Holistic review and performance evaluation of unsupervised learning methods for network anomaly detection

Article Category: Article

Published Online: May 19, 2024

Page range: -

Received: Nov 24, 2023

DOI: https://doi.org/10.2478/ijssis-2024-0016

Keywordsunsupervised learning, machine learning, unsupervised anomaly detection techniques, network anomaly detection, intrusion detection

© 2024 Niharika Sharma et al., published by Sciendo

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.

Figure 1:

Figure 2:

Figure 3:

Figure 4:

Figure 5:

Figure 6:

Figure 7:

Figure 8:

Keywords
unsupervised learning, machine learning, unsupervised anomaly detection techniques, network anomaly detection, intrusion detection