À propos de cet article

Citez

Figure 1:

Bibliometric analysis of AD literature (A) Chronological distribution of bibliometric papers (B) Percentage distribution of publication types. AD, anomaly detection.
Bibliometric analysis of AD literature (A) Chronological distribution of bibliometric papers (B) Percentage distribution of publication types. AD, anomaly detection.

Figure 2:

Roadmap of the paper.
Roadmap of the paper.

Figure 3:

Taxonomy of 13 selected UL methods for anomaly detection from 6 families. UL, unsupervised learning.
Taxonomy of 13 selected UL methods for anomaly detection from 6 families. UL, unsupervised learning.

Figure 4:

Framework of NAD based on UL [45]. NAD, network anomaly detection; UL, unsupervised learning.
Framework of NAD based on UL [45]. NAD, network anomaly detection; UL, unsupervised learning.

Figure 5:

Categorization of UL techniques [23]. UL, unsupervised learning.
Categorization of UL techniques [23]. UL, unsupervised learning.

Figure 6:

Comparison of UL methods on the NSL-KDD dataset. UL, unsupervised learning.
Comparison of UL methods on the NSL-KDD dataset. UL, unsupervised learning.

Figure 7:

Comparison of UL methods on the UNSW-NB15 dataset. UL, unsupervised learning.
Comparison of UL methods on the UNSW-NB15 dataset. UL, unsupervised learning.

Figure 8:

Comparison of UL methods on the CIC-IDS2017 dataset. IDS, intrusion detection systems; UL, unsupervised learning.
Comparison of UL methods on the CIC-IDS2017 dataset. IDS, intrusion detection systems; UL, unsupervised learning.

A relative comparison of this paper with existing surveys/review papers.

References Discussion Focus 1 2 3 4 5 6
[15] Focus on literature review and performance evaluation of UL methods. IDS × × × ×
[16] It provides a comparative analysis of various unsupervised NAD approaches and investigates standard metrics suitable for many connections. AD in networks × × ×
[17] Presents a detailed review of different clustering methods and their strengths and weaknesses. _ × × × ×
[18] Presents a framework for evaluating anomaly detection methods for HTTP. HTTP AD × × ×
[19] An overview of ML techniques for intrusion detection between 2000 and 2007 is presented. IDS × × × ×
[20] It provides an overview and comparison of various clustering techniques, pros, and cons. AD × × × ×
[21] Performance evaluation of four novelty detection algorithms is carried out. Novelty detection × ×
[22] Presents a detailed review of popular traditional and modern clustering approaches. _ × × × ×
[1] Nineteen widely employed unsupervised techniques are evaluated to provide an understanding of their performance in various domains. AD × × ×
[2] The proposed work focuses on a comparative evaluation of four UL algorithms. Smart city wireless networks × × ×
[3] A survey on ML techniques focusing on unsupervised and hybrid IDS is presented. IDS × ×
[4] The proposed work aims to provide an experimental comparison of unsupervised methods employed for NAD against five intrusion detection datasets. Anomaly-based IDS × ×
[23] A comprehensive survey of UL techniques in networking is presented. Anomaly-based in networking × ×
[5] Review and compare ML algorithms for IDS. IDS × × ×
[6] Presents a detailed review of state-of-the-art IDS methods, datasets, performance measures, and research challenges. IDS ×
[8] An in-depth review of state-of-the-art clustering methods is presented. - × ×
[9] A comprehensive overview of UL techniques for AD is presented, emphasizing their utility in scenarios with scarce labeled data. AD in industrial applications × ×
[11] Review and compare outlier detection techniques from seven different families. Outlier detection × × ×
[12] AD techniques are explored comprehensively to address the detection of emerging threats. IoT and sensor data × ×
[13] In-depth review of various unsupervised and semisupervised clustering methods. - × ×
[14] Focus on log file analysis for early incident detection, particularly emphasizing self-learning AD techniques. AD × × ×
Our survey The primary focus is on studying UL NAD methods while considering recent advances. NAD

Relative comparison of UL approaches for network anomaly/intrusion detection.

Authors Methodology Algorithm Dataset Input data DoS Probe R2L U2R Real-time detection
[83] Unsupervised approach for outlier detection using subspace clustering and evidence accumulation. DBSCAN METROSEC Continuous 95 95 85 85
[84] Tree-based subspace clustering approach for high-dimensional datasets; includes cluster stability analysis. Tree-based subspace clustering (TCLUS) KDDCUP99, TUIDS Mixed 99 96 86 66
[85] Multiclustering scheme incorporating subspace clustering and evidence accumulation to overcome knowledge-based approach limitations. DBSCAN KDDCUP99, METROSEC N/A - - - -
[86] Particle swarm optimization clustering strategy based on map-reduce methodology for parallelization in large-scale networks. PSO clustering KDDCUP99 Mixed Max AUC: 96.3
[87] Novel strategy for automatic tuning and optimization of detection model parameters in a real network environment. Clustering and one-class SVM KDDCUP99 and Kyoto University Continuous - - - -
[88] K-means clustering to generate training subsets; neuro-fuzzy and radial-basis SVM for classification. K-means, SVM, neuro-fuzzy neural network KDDCUP99 N/A 98.8 97.31 97.5 97.5
[89] Innate-immune strategy via UL for categorizing normal and abnormal profiles. DBSCAN KDDCUP99 N/A FPR: 0.008, TNR: 0.991, ACC: 77.1, Recall: 0.589
[49] Novel approach using cluster centers and NNs to transform the dataset into a one-dimensional feature set. Clustering, KNN KDDCUP99 N/A 99.68 87.61 3.85 57.02
[90] Nature-inspired meta-heuristic approach to optimize the optimum path forest clustering algorithm. Optimum path forest clustering ISCX, KDDCUP99, NSL-KDD N/A Purity measure: ISCX: 96.3, KDDCUP99: 71.66, NSL-KDD: 99.8
[91] UL technique for real-time detection of fast and complex zero-day attacks. DBSCAN DARPA, ISCX N/A ACC: 98.39%, Recall: 100%, Precision: 98.12%, FP: 3.61
[92] Modified optimum path forest algorithm to enhance IDS performance, particularly in detecting less frequent attacks. K-means, modified optimum path forest NSL-KDD N/A 96.89 85.92 77.98 81.13
[93] Hierarchical agglomerative clustering applied to SOM network to lower computational cost and sensitivity. Hierarchical agglomerative clustering, SOM NSL-KDD Mixed DR: 96.66%, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033

Comparative analysis of network datasets.

Dataset Data-source Year 1 2 3 4 5 6 7
DARPA (1998) MIT Lincoln Laboratory 1998 Emulated Benchmark 7,000,000 DoS, Probe, U2R, R2L 41
KDD CUP 99 University of California 1999 Emulated Benchmark 4,900,000 DoS, Probe, U2R, R2L 41
DEFCON N/A 2000 Real Benchmark N/A N/A N/A Flag traces
LBNL Lawrence Berkeley National Laboratory 2004 N/A Benchmark >100 hr × Malicious traces Internet traces
Kyoto Song et al. (2006) [114] 2006 Real Benchmark N/A Normal and attack sessions 24
CAIDA Jonker et al. (2017) [118] 2008–2017 Real Benchmark Huge × DDoS 20
CDX United States Military Academy 2009 Real Real-life N/A 5771 Buffer overflow 5
NSL-KDD Tavallaee et al. (2009) [126] 2009 Emulated Benchmark 148,517 DoS, Probe, U2R, R2L 41
ISCX 2012 Shiravi et al. (2012) [117] 2012 Real Real-life 2,450,324 DoS, DdoS, Bruteforce, Infiltration IP flows
UNSW-NB15 Moustafa and Slay (2015) [119] 2015 Emulated Benchmark 2,540,044 Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, Worms 49
CIDDS-001 Ring et al. (2017) [120] 2017 Emulated Benchmark 31,959,267 Ping scanning, Port scanning, Brute force, and DoS 14
CIDDS-002 Ring et al. (2017) [120] 2017 Emulated Benchmark 16,161,183 Ping scanning, Port scanning, Brute force, and DoS 14
CIC-IDS2017 Sharafaldin et al. (2018) [121] 2017 Emulated Benchmark 2,830,743 DdoS, Dos, Botnet, BruteForce, Infiltration, WebAttack, Port scan 80
CSE-CICIDS2018 Bharati and Tamane (2020) [122] 2018 Emulated Benchmark 16,232,943 DdoS, Dos Botnet, BruteForce, Infiltration, WebAttack, Port scan 80
CICDDoS 2019 Sharafaldin et al. (2019) [123] 2019 Emulated Benchmark 32,925 DdoS_DNS, DdoS_LDAP, DdoS_MSSQL, DdoS_NetBIOS, DdoS_NTP, DdoS_ SNMP, DdoS_SSDP, DdoS_SYN, DdoS_ TFTP, DdoS_UDP, DdoS_UDP-Lag, DdoS_WebDDoS 76
BoT-IoT Koroniotis et al. (2019) [124] 2019 Real Real-life 73,360,900 DoS, DdoS, Reconnaissance, Theft 29
IoT-23 N/A 2020 Real Real-life N/A Mirai, Torii, Hide and Seek, Muhstik, Hakai, IRCBot, Hajime, Trojan, Kenjiro, Okiru, Gagfyt 21

Recent developments in unsupervised-based network intrusion/anomaly detection.

Authors/year Methodology Algorithm/techniques used Datasets Attack class Metrics (%) Limitations/future work
Amoli et al. (2015) [91] Real-time intrusion detection using adaptive thresholds. DBSCAN DARPA, ISCX DoS, DdoS, POD, SMURF, Mail-bomb, botnet, port scanning-port sweep Precision: 98.12, ACC: 98.39, FPR: 3.61 Future work: Detecting complex attacks, distinguishing flash crowds and DdoS for better clarity.
Zhang et al. (2016) [42] Utilizing One-class SVM for network intrusion identification. One-class SVM NSL-KDD DoS, Probe, U2R, R2L Precision: 99.3, Recall: 91.61, F-value: 95.18 Low DR for minority-class attacks like U2R and R2L.
Landress (2016) [61] Employing K-means clustering, J48, decision tree, and SOM to reduce false positives. K-means, J48, decision tree, self-organizing map KDDCUP99 DoS, Probe, U2R, R2L ACC: 98.92 Limitation: Increased computational complexity with SOM; explore efficient hybrid techniques for real-time processing.
He et al. (2017) [94] Exploiting SDN for effective anomaly detection. DBSCAN KDDCUP99 DoS, Probe, U2R, R2L ACC: 94.5 Future work: Focus on real-time packet clustering for timely detection.
Ariafar and Kiani (2017) [95] Optimizing clusters using GA, K-means, and decision trees. K-means, decision tree, GA NSL-KDD DoS, Probe, U2R, R2L DR: 99.1, FAR: 1.8 Evaluate on modern-day datasets like CIC-IDS 201 and CSE-CICIDS2018.
Bigdeli et al. (2018) [96] Incremental cluster updates with spectral and density-based clustering. Spectral and density-based clustering KDDCUP99, NSL-KDD, Darpa98, IUSTsip, DataSetMe DoS, Probe, U2R, R2L DR: 94%, FAR: 4% Introduce concept drift while merging clusters.
Almi'Ani et al. (2018) [97] Hierarchical agglomerative clustering using k-means for reduced training time. Hierarchical agglomerative clustering NSL-KDD DoS, Probe, U2R, R2L DR: 96.66, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033 Low accuracy due to low sensitivity toward normal behavior; investigate fuzzy C-means clustering for accuracy enhancement.
Zhou et al. (2019) [98] Hybrid technique using KPCA and ELM. KPCA, ELM KDDCUP99 DoS, Probe, U2R, R2L DR: DoS: 98.96, Probe: 98.54, R2L: 94.72, U2R: 36.54, Acc: 98.18, FAR: 2.38 The DR of U2R is quite low. To optimize the results of ELM, evolutionary algorithms need to be applied.
Choi et al. (2019) [99] Extracting key features using AE. AE NSL-KDD DoS, Probe, U2R, R2L ACC: 91.70 Improve U2R DR; apply evolutionary algorithms to optimize ELM results.
Aliakbarisani et al. (2019) [74] Formulates a constraint trace ratio optimization problem for Laplacian Eigenmap strategy. Constraint trace optimization, PCA, LDA NSL-KDD, Kyoto 2006+ DoS, Probe, U2R, R2L ACC: 97.84, F-score: 0.878, FPR: 0.001 Train and test with different datasets; explore ensemble strategies based on UL models for enhanced performance.
Paulauskas and Baskys (2019) [72] Employing HBOS mechanism for identifying rare attacks like U2R and R2L. HBOS NSL-KDD DoS, Probe, U2R, R2L F-measure: 87 Investigate online learning metrics for newly discovered attacks.
Hwang et al. (2020) [100] CNN and AE for extracting raw features from network traffic. CNN, AE USTC-TFC 2016, Mirai-RGU, Mirai-CCU SYN flood, UDP flood, ACK flood, and HTTP flood, Mirai C&C ACC: 99.77, Precision: 99.93, Recall: 99.17, F1 measure: 99.55, FNR: 0.02, FPR: 0.83 Improve performance rate of majority classes.
Zavrak and Iskefiyeli (2020) [101] Deploys flow-based IDS with One-class SVM as anomaly detector. AE, VAE, One-Class SVM CIC-IDS2017 DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed AUC: VAE: 75.96, AE: 73.98, One-Class SVM: 66.36 Consider flow-based attributes collected at specified time intervals to improve DR and reduce FAR.
Truong-Huu et al. (2020) [79] Uses GAN strategy to extract useful features and proposes a traffic aggregation technique. GAN UNSW-NB15, CIC-IDS2017 Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed

UNSW-NB15: Precision: 0.84, Recall: 0.85, F1-score: 0.85, AUPRC: 0.8831

CIC-IDS2017: Prec: 0.8260, Recall: 0.8268, F1-score: 0.8264, AUROC: 0.9529, AUPRC: 0.8271

Investigate multi-class classification approach for identifying different types of attacks.
Prasad et al. (2020) [56] Proposes a novel cluster center initialization approach to overcome shortcomings of conventional clustering techniques. K-means clustering, cluster center initialization CIC-IDS2017 DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed DR: 88, FR: 88.5, Precision: 88, F-measure: 0.531, ACC: 88.6 Address limitations like manual pre-processing and time/ space complexity in MANET deployment.
Megantara and Ahmad (2021) [102] Utilizes feature importance and data-reduction techniques to improve the prediction performance of NAD. Decision tree, LOF NSL-KDD, UNSW-NB15 DoS, Probe, U2R, R2L Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, worms

NSL-KDD: Acc: 99.73

UNSW-NB15: 91.86

The size of the LOF cluster can be optimized and the threshold value for handling outliers can be improved.
Liao et al. (2021) [103] Presents an ensemble of UL schemes based on a novel weighting scheme. AE, GANs UNSW-NB15, CIC-IDS 2017 Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed

UNSW-NB15: Precision: 97.9, Recall: 92.4, F1-score: 94.9

CIC-IDS2017: Precision: 83.8, Recall: 84, F1-score: 83.5

Address suboptimal results for specific attack families, explore optimization for better performance.
Verkerken et al. (2022) [71] Proposes an inter-dataset evaluation approach for ensemble UL algorithms. PCA, IF, AE, One-class SVM CIC-IDS 2017, CSE-IDS 2017 DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heartbleed

ACC: PCA: 90.9, IF: 91.1, AE: 99.9, One-class SVM: 98.9

AUROC: PCA: 93.73, IF: 95.8, AE: 97.75, One-class SVM: 94.20

Consider employing supervised learning approaches for generalization strength validation.
Singh and Jang-Jaccard (2022) [104] Proposes a unified AE architecture based on CNN and LSTM to examine spatiotemporal correlations in traffic data. MSCNN, LSTM-based AE NSL-KDD, UNSW-NB15, CICDDoS2019 Normal, Attack (Binary class) ACC: 97.10, Precision: 95.9, Recall: 96.4, F-score: 96.0 (average) Fine-tune hyperparameters for enhanced model performance using optimization techniques.
Wang et al. (2022) [105] Proposes an ensemble of UL algorithms to address challenges in processing overhead and poor detection performance for unseen threats. AE, IF CES-CICIDS2018, MQTT-IOT-IDS2020 Sparta SSH brute force, DoS attacks-SlowHTTPTest, DoS-Hulk, DdoS attack-LOIC-UDP, DdoS attacks-LOIC-HTTP, Brute force-XSS, Brute force-web, FTP-brute force Average ACC: 96.43, Average Recall: 95.95, Average F-score: 96.02 Improve DR of specific attacks like “DoS-Hulk” in the proposed work.
De C. Bertoli et al. (2022) [106] Presents an NIDS for generalized detection performance in heterogeneous networks. Deep AE, FL, Energy Flow Classifier UNSW-NB15, CSE-CICIDS2018, BoT-IoT, ToN-IoT Normal, Attack (Binary class) BoT-IoT: ACC: 93, Recall: 93, Precision: 99, F-score: 95 ToN-IoT: ACC: 85, Recall: 85, Precision: 87, F-score: 77, UNSWNB15: ACC: 97, Recall: 99, Precision: 57, F-score: 73 CSE-CICIDS2018: ACC: 98, Recall: 88, Precision: 92, F-score: 90 Deploy the proposed approach in distributed NIDS for robustness evaluation.
Eren et al. (2023) [107] Proposes a novel tensor decomposition method to enhance the detection of unseen attacks in network anomaly and intrusion detection, leveraging tensor factorization for improved generalization and identification of evolving threats. Unsupervised DL, Tensor factorization algorithm Neris-botnet, Spam e-mail - ROC-AUC, PR-AUC Average ROC-AUC: 0.9661, Average PR-AUC: 0.9152 Using a tensor decomposition algorithm with latent factors enables enhanced initialization of the tensor factorization algorithm for improved performance.
Lan et al. (2023) [108] Presents a novel framework for unsupervised intrusion detection that transfers knowledge from known attacks to detect new ones using a hierarchical attention-based triplet network and unsupervised domain adaptation. Attention network, unsupervised domain adaption ISCX-2012, UNSW-NB15, CIC-IDS2017, CTU-13 HTTP DoS, Brute Force SSH, PortScan, Brute Force, Dos slow loris, Web attack XSS, Neris, Rbot, Fuzzers, Generic, Shellcode, Exploits Average ACC: 96.38, Average DR: 94.16 Utilize tensor decomposition algorithm with latent factors for enhanced initialization and performance.
Boppana and Bagade (2023) [109] Synergy of GANs and AE for unsupervised intrusion detection in MQTT networks. AE, GAN MQTT-IoT-IDS2020 Normal, Attack (Binary class) F1-score: 97 Evaluate generalizability to diverse network architectures in future work.

Results of 13 selected UL methods for NAD on the CIC-IDS2017 dataset.

Algorithms Family Accuracy Precision Recall F1-score
K-NN Distance-based 96.4 96.7 96.0 97.0
ODIN Distance-based 97.6 97.4 97.5 98.0
LOF Density-based 96 95.3 93.6 94.8
COF Density-based 93.3 93.2 95 93.7
K-means Clustering-based 90.2 87.4 88.6 89.5
DBSCAN Clustering-based 93.7 92.7 90.6 92.3
EM Clustering-based 94.3 93.2 91.2 92.4
PCA Dimensionality-reduction 96.3 95.4 94.9 95.1
KPCA Dimensionality-reduction 97.3 97.2 96.9 97
ICA Dimensionality-reduction 96.3 96.2 96 96.3
HBOS Statistical-based 95.9 94.9 93.5 93.0
One-class SVM Classification-based 98.3 98.1 99 98.4
IF Classification-based 99.6 99.5 99.2 99.4

Hardware and software specifications.

Components Specifications
Operating system Windows 11
System type 64-bit operating system, x 64-based Processor
Processor 11th Gen Intel ® Core (TM)
i5-1135G7 @ 2.40 GHz
2.42 GHz
RAM 8 GB
Python 3.7
Datasets NSL-KDD, UNSW-NB15, CICIDS2017

Results of 13 selected UL methods for NAD on the NSL-KDD dataset.

Algorithms Family Accuracy Precision Recall F1-score
K-NN Neighbor-based 98.4 98.0 97.9 98.5
ODIN Neighbor-based 99.4 99.0 98.6 98.8
LOF Density-based 97.0 96.8 95.3 95.6
COF Density-based 94.0 94.5 96.1 95.1
K-means Clustering-based 94 92.46 92.3 93
DBSCAN Clustering-based 94.5 94.0 92.9 94.0
EM Clustering-based 95.0 93 92.9 94.0
PCA Dimensionality-reduction 97.8 97.0 96.7 97.0
KPCA Dimensionality-reduction 99.2 99.0 98.9 98.6
ICA Dimensionality-reduction 98.0 97.5 97.8 98.1
HBOS Statistical-based 94.4 93.9 95.8 96.0
One-class SVM Classification-based 99.4 99.2 99.2 99.5
IF Classification-based 99.9 99.8 99.6 99.7

Results of 13 selected UL methods for NAD on the UNSW-NB15 dataset.

Algorithms Family Accuracy Precision Recall F1-score
K-NN Distance-based 98.2 97.8 97.6 97.7
ODIN Distance-based 99.2 98.9 98.5 98.6
LOF Density-based 96.9 95.8 95.1 95.0
COF Density-based 93.8 94.2 95.9 94.7
K-means Clustering-based 93.7 92.0 92.1 92.9
DBSCAN Clustering-based 94.2 93.8 92.7 93.5
EM Clustering-based 95.5 92.6 92.3 93.4
PCA Dimensionality-reduction 97.2 96.9 97.3 97.1
KPCA Dimensionality-reduction 99.1 98.5 98.5 98.7
ICA Dimensionality-reduction 97.7 97.4 97.1 97.3
HBOS Statistical-based 92.3 91.9 94.6 94.8
One-class SVM Classification-based 99.2 99.0 99.2 99.4
IF Classification-based 99.7 99.5 99.3 99.6
eISSN:
1178-5608
Langue:
Anglais
Périodicité:
Volume Open
Sujets de la revue:
Engineering, Introductions and Overviews, other