[ |
Focus on literature review and performance evaluation of UL methods. | IDS | ≈ | × | × | × | ✓ | × |
[ |
It provides a comparative analysis of various unsupervised NAD approaches and investigates standard metrics suitable for many connections. | AD in networks | ≈ | ✓ | × | × | ✓ | × |
[ |
Presents a detailed review of different clustering methods and their strengths and weaknesses. | _ | ≈ | ✓ | × | × | × | × |
[ |
Presents a framework for evaluating anomaly detection methods for HTTP. | HTTP AD | ≈ | ≈ | × | × | ✓ | × |
[ |
An overview of ML techniques for intrusion detection between 2000 and 2007 is presented. | IDS | ✓ | × | × | ≈ | × | × |
[ |
It provides an overview and comparison of various clustering techniques, pros, and cons. | AD | ≈ | ≈ | × | × | × | × |
[ |
Performance evaluation of four novelty detection algorithms is carried out. | Novelty detection | ✓ | ✓ | × | × | ✓ | ✓ |
[ |
Presents a detailed review of popular traditional and modern clustering approaches. | _ | ≈ | ≈ | × | × | × | × |
[ |
Nineteen widely employed unsupervised techniques are evaluated to provide an understanding of their performance in various domains. | AD | ≈ | ✓ | × | × | ✓ | × |
[ |
The proposed work focuses on a comparative evaluation of four UL algorithms. | Smart city wireless networks | ≈ | ✓ | × | × | ✓ | × |
[ |
A survey on ML techniques focusing on unsupervised and hybrid IDS is presented. | IDS | ✓ | ✓ | × | ✓ | × | ✓ |
[ |
The proposed work aims to provide an experimental comparison of unsupervised methods employed for NAD against five intrusion detection datasets. | Anomaly-based IDS | ≈ | ✓ | × | ✓ | ✓ | × |
[ |
A comprehensive survey of UL techniques in networking is presented. | Anomaly-based in networking | ≈ | ✓ | ✓ | × | × | ✓ |
[ |
Review and compare ML algorithms for IDS. | IDS | ✓ | ≈ | × | × | ✓ | × |
[ |
Presents a detailed review of state-of-the-art IDS methods, datasets, performance measures, and research challenges. | IDS | ✓ | ≈ | × | ✓ | ✓ | ✓ |
[ |
An in-depth review of state-of-the-art clustering methods is presented. | - | ≈ | ✓ | ✓ | × | × | ✓ |
[ |
A comprehensive overview of UL techniques for AD is presented, emphasizing their utility in scenarios with scarce labeled data. | AD in industrial applications | ✓ | ✓ | ✓ | × | × | ✓ |
[ |
Review and compare outlier detection techniques from seven different families. | Outlier detection | ≈ | ≈ | × | × | ✓ | × |
[ |
AD techniques are explored comprehensively to address the detection of emerging threats. | IoT and sensor data | ✓ | ≈ | ✓ | × | × | ✓ |
[ |
In-depth review of various unsupervised and semisupervised clustering methods. | - | ✓ | ✓ | ≈ | × | ✓ | × |
[ |
Focus on log file analysis for early incident detection, particularly emphasizing self-learning AD techniques. | AD | ✓ | ≈ | ✓ | × | × | × |
Our survey | The primary focus is on studying UL NAD methods while considering recent advances. | NAD | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
[ |
Unsupervised approach for outlier detection using subspace clustering and evidence accumulation. | DBSCAN | METROSEC | Continuous | 95 | 95 | 85 | 85 | ✓ |
[ |
Tree-based subspace clustering approach for high-dimensional datasets; includes cluster stability analysis. | Tree-based subspace clustering (TCLUS) | KDDCUP99, TUIDS | Mixed | 99 | 96 | 86 | 66 | |
[ |
Multiclustering scheme incorporating subspace clustering and evidence accumulation to overcome knowledge-based approach limitations. | DBSCAN | KDDCUP99, METROSEC | N/A | - | - | - | - | ✓ |
[ |
Particle swarm optimization clustering strategy based on map-reduce methodology for parallelization in large-scale networks. | PSO clustering | KDDCUP99 | Mixed | Max AUC: 96.3 | ||||
[ |
Novel strategy for automatic tuning and optimization of detection model parameters in a real network environment. | Clustering and one-class SVM | KDDCUP99 and Kyoto University | Continuous | - | - | - | - | ✓ |
[ |
K-means clustering to generate training subsets; neuro-fuzzy and radial-basis SVM for classification. | K-means, SVM, neuro-fuzzy neural network | KDDCUP99 | N/A | 98.8 | 97.31 | 97.5 | 97.5 | |
[ |
Innate-immune strategy via UL for categorizing normal and abnormal profiles. | DBSCAN | KDDCUP99 | N/A | FPR: 0.008, TNR: 0.991, ACC: 77.1, Recall: 0.589 | ✓ | |||
[ |
Novel approach using cluster centers and NNs to transform the dataset into a one-dimensional feature set. | Clustering, KNN | KDDCUP99 | N/A | 99.68 | 87.61 | 3.85 | 57.02 | |
[ |
Nature-inspired meta-heuristic approach to optimize the optimum path forest clustering algorithm. | Optimum path forest clustering | ISCX, KDDCUP99, NSL-KDD | N/A | Purity measure: ISCX: 96.3, KDDCUP99: 71.66, NSL-KDD: 99.8 | ||||
[ |
UL technique for real-time detection of fast and complex zero-day attacks. | DBSCAN | DARPA, ISCX | N/A | ACC: 98.39%, Recall: 100%, Precision: 98.12%, FP: 3.61 | ✓ | |||
[ |
Modified optimum path forest algorithm to enhance IDS performance, particularly in detecting less frequent attacks. | K-means, modified optimum path forest | NSL-KDD | N/A | 96.89 | 85.92 | 77.98 | 81.13 | |
[ |
Hierarchical agglomerative clustering applied to SOM network to lower computational cost and sensitivity. | Hierarchical agglomerative clustering, SOM | NSL-KDD | Mixed | DR: 96.66%, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033 |
DARPA (1998) | MIT Lincoln Laboratory | 1998 | Emulated | Benchmark | ✓ | 7,000,000 | ✓ | DoS, Probe, U2R, R2L | 41 |
KDD CUP 99 | University of California | 1999 | Emulated | Benchmark | ✓ | 4,900,000 | ✓ | DoS, Probe, U2R, R2L | 41 |
DEFCON | N/A | 2000 | Real | Benchmark | N/A | N/A | ✓ | N/A | Flag traces |
LBNL | Lawrence Berkeley National Laboratory | 2004 | N/A | Benchmark | ✓ | >100 hr | × | Malicious traces | Internet traces |
Kyoto | Song et al. (2006) [ |
2006 | Real | Benchmark | ✓ | N/A | ✓ | Normal and attack sessions | 24 |
CAIDA | Jonker et al. (2017) [ |
2008–2017 | Real | Benchmark | ✓ | Huge | × | DDoS | 20 |
CDX | United States Military Academy | 2009 | Real | Real-life | N/A | 5771 | ✓ | Buffer overflow | 5 |
NSL-KDD | Tavallaee et al. (2009) [ |
2009 | Emulated | Benchmark | ✓ | 148,517 | ✓ | DoS, Probe, U2R, R2L | 41 |
ISCX 2012 | Shiravi et al. (2012) [ |
2012 | Real | Real-life | ✓ | 2,450,324 | ✓ | DoS, DdoS, Bruteforce, Infiltration | IP flows |
UNSW-NB15 | Moustafa and Slay (2015) [ |
2015 | Emulated | Benchmark | ✓ | 2,540,044 | ✓ | Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, Worms | 49 |
CIDDS-001 | Ring et al. (2017) [ |
2017 | Emulated | Benchmark | ✓ | 31,959,267 | ✓ | Ping scanning, Port scanning, Brute force, and DoS | 14 |
CIDDS-002 | Ring et al. (2017) [ |
2017 | Emulated | Benchmark | ✓ | 16,161,183 | ✓ | Ping scanning, Port scanning, Brute force, and DoS | 14 |
CIC-IDS2017 | Sharafaldin et al. (2018) [ |
2017 | Emulated | Benchmark | ✓ | 2,830,743 | ✓ | DdoS, Dos, Botnet, BruteForce, Infiltration, WebAttack, Port scan | 80 |
CSE-CICIDS2018 | Bharati and Tamane (2020) [ |
2018 | Emulated | Benchmark | ✓ | 16,232,943 | ✓ | DdoS, Dos Botnet, BruteForce, Infiltration, WebAttack, Port scan | 80 |
CICDDoS 2019 | Sharafaldin et al. (2019) [ |
2019 | Emulated | Benchmark | ✓ | 32,925 | ✓ | DdoS_DNS, DdoS_LDAP, DdoS_MSSQL, DdoS_NetBIOS, DdoS_NTP, DdoS_ SNMP, DdoS_SSDP, DdoS_SYN, DdoS_ TFTP, DdoS_UDP, DdoS_UDP-Lag, DdoS_WebDDoS | 76 |
BoT-IoT | Koroniotis et al. (2019) [ |
2019 | Real | Real-life | ✓ | 73,360,900 | ✓ | DoS, DdoS, Reconnaissance, Theft | 29 |
IoT-23 | N/A | 2020 | Real | Real-life | ✓ | N/A | ✓ | Mirai, Torii, Hide and Seek, Muhstik, Hakai, IRCBot, Hajime, Trojan, Kenjiro, Okiru, Gagfyt | 21 |
Amoli et al. (2015) [ |
Real-time intrusion detection using adaptive thresholds. | DBSCAN | DARPA, ISCX | DoS, DdoS, POD, SMURF, Mail-bomb, botnet, port scanning-port sweep | Precision: 98.12, ACC: 98.39, FPR: 3.61 | Future work: Detecting complex attacks, distinguishing flash crowds and DdoS for better clarity. |
Zhang et al. (2016) [ |
Utilizing One-class SVM for network intrusion identification. | One-class SVM | NSL-KDD | DoS, Probe, U2R, R2L | Precision: 99.3, Recall: 91.61, F-value: 95.18 | Low DR for minority-class attacks like U2R and R2L. |
Landress (2016) [ |
Employing K-means clustering, J48, decision tree, and SOM to reduce false positives. | K-means, J48, decision tree, self-organizing map | KDDCUP99 | DoS, Probe, U2R, R2L | ACC: 98.92 | Limitation: Increased computational complexity with SOM; explore efficient hybrid techniques for real-time processing. |
He et al. (2017) [ |
Exploiting SDN for effective anomaly detection. | DBSCAN | KDDCUP99 | DoS, Probe, U2R, R2L | ACC: 94.5 | Future work: Focus on real-time packet clustering for timely detection. |
Ariafar and Kiani (2017) [ |
Optimizing clusters using GA, K-means, and decision trees. | K-means, decision tree, GA | NSL-KDD | DoS, Probe, U2R, R2L | DR: 99.1, FAR: 1.8 | Evaluate on modern-day datasets like CIC-IDS 201 and CSE-CICIDS2018. |
Bigdeli et al. (2018) [ |
Incremental cluster updates with spectral and density-based clustering. | Spectral and density-based clustering | KDDCUP99, NSL-KDD, Darpa98, IUSTsip, DataSetMe | DoS, Probe, U2R, R2L | DR: 94%, FAR: 4% | Introduce concept drift while merging clusters. |
Almi'Ani et al. (2018) [ |
Hierarchical agglomerative clustering using k-means for reduced training time. | Hierarchical agglomerative clustering | NSL-KDD | DoS, Probe, U2R, R2L | DR: 96.66, ACC: 83.46, Precision: 75, FPR: 0.279, FNR: 0.033 | Low accuracy due to low sensitivity toward normal behavior; investigate fuzzy C-means clustering for accuracy enhancement. |
Zhou et al. (2019) [ |
Hybrid technique using KPCA and ELM. | KPCA, ELM | KDDCUP99 | DoS, Probe, U2R, R2L | DR: DoS: 98.96, Probe: 98.54, R2L: 94.72, U2R: 36.54, Acc: 98.18, FAR: 2.38 | The DR of U2R is quite low. To optimize the results of ELM, evolutionary algorithms need to be applied. |
Choi et al. (2019) [ |
Extracting key features using AE. | AE | NSL-KDD | DoS, Probe, U2R, R2L | ACC: 91.70 | Improve U2R DR; apply evolutionary algorithms to optimize ELM results. |
Aliakbarisani et al. (2019) [ |
Formulates a constraint trace ratio optimization problem for Laplacian Eigenmap strategy. | Constraint trace optimization, PCA, LDA | NSL-KDD, Kyoto 2006+ | DoS, Probe, U2R, R2L | ACC: 97.84, F-score: 0.878, FPR: 0.001 | Train and test with different datasets; explore ensemble strategies based on UL models for enhanced performance. |
Paulauskas and Baskys (2019) [ |
Employing HBOS mechanism for identifying rare attacks like U2R and R2L. | HBOS | NSL-KDD | DoS, Probe, U2R, R2L | F-measure: 87 | Investigate online learning metrics for newly discovered attacks. |
Hwang et al. (2020) [ |
CNN and AE for extracting raw features from network traffic. | CNN, AE | USTC-TFC 2016, Mirai-RGU, Mirai-CCU | SYN flood, UDP flood, ACK flood, and HTTP flood, Mirai C&C | ACC: 99.77, Precision: 99.93, Recall: 99.17, F1 measure: 99.55, FNR: 0.02, FPR: 0.83 | Improve performance rate of majority classes. |
Zavrak and Iskefiyeli (2020) [ |
Deploys flow-based IDS with One-class SVM as anomaly detector. | AE, VAE, One-Class SVM | CIC-IDS2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed | AUC: VAE: 75.96, AE: 73.98, One-Class SVM: 66.36 | Consider flow-based attributes collected at specified time intervals to improve DR and reduce FAR. |
Truong-Huu et al. (2020) [ |
Uses GAN strategy to extract useful features and proposes a traffic aggregation technique. | GAN | UNSW-NB15, CIC-IDS2017 | Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed |
UNSW-NB15: Precision: 0.84, Recall: 0.85, F1-score: 0.85, AUPRC: 0.8831 CIC-IDS2017: Prec: 0.8260, Recall: 0.8268, F1-score: 0.8264, AUROC: 0.9529, AUPRC: 0.8271 |
Investigate multi-class classification approach for identifying different types of attacks. |
Prasad et al. (2020) [ |
Proposes a novel cluster center initialization approach to overcome shortcomings of conventional clustering techniques. | K-means clustering, cluster center initialization | CIC-IDS2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed | DR: 88, FR: 88.5, Precision: 88, F-measure: 0.531, ACC: 88.6 | Address limitations like manual pre-processing and time/ space complexity in MANET deployment. |
Megantara and Ahmad (2021) [ |
Utilizes feature importance and data-reduction techniques to improve the prediction performance of NAD. | Decision tree, LOF | NSL-KDD, UNSW-NB15 | DoS, Probe, U2R, R2L Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance, Shellcode, worms |
NSL-KDD: Acc: 99.73 UNSW-NB15: 91.86 |
The size of the LOF cluster can be optimized and the threshold value for handling outliers can be improved. |
Liao et al. (2021) [ |
Presents an ensemble of UL schemes based on a novel weighting scheme. | AE, GANs | UNSW-NB15, CIC-IDS 2017 | Backdoors, Exploits, Worms, Shellcode, generic, DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heart-bleed |
UNSW-NB15: Precision: 97.9, Recall: 92.4, F1-score: 94.9 CIC-IDS2017: Precision: 83.8, Recall: 84, F1-score: 83.5 |
Address suboptimal results for specific attack families, explore optimization for better performance. |
Verkerken et al. (2022) [ |
Proposes an inter-dataset evaluation approach for ensemble UL algorithms. | PCA, IF, AE, One-class SVM | CIC-IDS 2017, CSE-IDS 2017 | DoS slow loris, DoS slow HTTP-test, DoS Hulk, DoS golden eye, Heartbleed |
ACC: PCA: 90.9, IF: 91.1, AE: 99.9, One-class SVM: 98.9 AUROC: PCA: 93.73, IF: 95.8, AE: 97.75, One-class SVM: 94.20 |
Consider employing supervised learning approaches for generalization strength validation. |
Singh and Jang-Jaccard (2022) [ |
Proposes a unified AE architecture based on CNN and LSTM to examine spatiotemporal correlations in traffic data. | MSCNN, LSTM-based AE | NSL-KDD, UNSW-NB15, CICDDoS2019 | Normal, Attack (Binary class) | ACC: 97.10, Precision: 95.9, Recall: 96.4, F-score: 96.0 (average) | Fine-tune hyperparameters for enhanced model performance using optimization techniques. |
Wang et al. (2022) [ |
Proposes an ensemble of UL algorithms to address challenges in processing overhead and poor detection performance for unseen threats. | AE, IF | CES-CICIDS2018, MQTT-IOT-IDS2020 | Sparta SSH brute force, DoS attacks-SlowHTTPTest, DoS-Hulk, DdoS attack-LOIC-UDP, DdoS attacks-LOIC-HTTP, Brute force-XSS, Brute force-web, FTP-brute force | Average ACC: 96.43, Average Recall: 95.95, Average F-score: 96.02 | Improve DR of specific attacks like “DoS-Hulk” in the proposed work. |
De C. Bertoli et al. (2022) [ |
Presents an NIDS for generalized detection performance in heterogeneous networks. | Deep AE, FL, Energy Flow Classifier | UNSW-NB15, CSE-CICIDS2018, BoT-IoT, ToN-IoT | Normal, Attack (Binary class) | BoT-IoT: ACC: 93, Recall: 93, Precision: 99, F-score: 95 ToN-IoT: ACC: 85, Recall: 85, Precision: 87, F-score: 77, UNSWNB15: ACC: 97, Recall: 99, Precision: 57, F-score: 73 CSE-CICIDS2018: ACC: 98, Recall: 88, Precision: 92, F-score: 90 | Deploy the proposed approach in distributed NIDS for robustness evaluation. |
Eren et al. (2023) [ |
Proposes a novel tensor decomposition method to enhance the detection of unseen attacks in network anomaly and intrusion detection, leveraging tensor factorization for improved generalization and identification of evolving threats. | Unsupervised DL, Tensor factorization algorithm | Neris-botnet, Spam e-mail | - | ROC-AUC, PR-AUC Average ROC-AUC: 0.9661, Average PR-AUC: 0.9152 | Using a tensor decomposition algorithm with latent factors enables enhanced initialization of the tensor factorization algorithm for improved performance. |
Lan et al. (2023) [ |
Presents a novel framework for unsupervised intrusion detection that transfers knowledge from known attacks to detect new ones using a hierarchical attention-based triplet network and unsupervised domain adaptation. | Attention network, unsupervised domain adaption | ISCX-2012, UNSW-NB15, CIC-IDS2017, CTU-13 | HTTP DoS, Brute Force SSH, PortScan, Brute Force, Dos slow loris, Web attack XSS, Neris, Rbot, Fuzzers, Generic, Shellcode, Exploits | Average ACC: 96.38, Average DR: 94.16 | Utilize tensor decomposition algorithm with latent factors for enhanced initialization and performance. |
Boppana and Bagade (2023) [ |
Synergy of GANs and AE for unsupervised intrusion detection in MQTT networks. | AE, GAN | MQTT-IoT-IDS2020 | Normal, Attack (Binary class) | F1-score: 97 | Evaluate generalizability to diverse network architectures in future work. |
K-NN | Distance-based | 96.4 | 96.7 | 96.0 | 97.0 |
ODIN | Distance-based | 97.6 | 97.4 | 97.5 | 98.0 |
LOF | Density-based | 96 | 95.3 | 93.6 | 94.8 |
COF | Density-based | 93.3 | 93.2 | 95 | 93.7 |
K-means | Clustering-based | 90.2 | 87.4 | 88.6 | 89.5 |
DBSCAN | Clustering-based | 93.7 | 92.7 | 90.6 | 92.3 |
EM | Clustering-based | 94.3 | 93.2 | 91.2 | 92.4 |
PCA | Dimensionality-reduction | 96.3 | 95.4 | 94.9 | 95.1 |
KPCA | Dimensionality-reduction | 97.3 | 97.2 | 96.9 | 97 |
ICA | Dimensionality-reduction | 96.3 | 96.2 | 96 | 96.3 |
HBOS | Statistical-based | 95.9 | 94.9 | 93.5 | 93.0 |
One-class SVM | Classification-based | 98.3 | 98.1 | 99 | 98.4 |
IF | Classification-based | 99.6 | 99.5 | 99.2 | 99.4 |
Operating system | Windows 11 |
System type | 64-bit operating system, x 64-based Processor |
Processor | 11th Gen Intel ® Core (TM) |
i5-1135G7 @ 2.40 GHz | |
2.42 GHz | |
RAM | 8 GB |
Python | 3.7 |
Datasets | NSL-KDD, UNSW-NB15, CICIDS2017 |
K-NN | Neighbor-based | 98.4 | 98.0 | 97.9 | 98.5 |
ODIN | Neighbor-based | 99.4 | 99.0 | 98.6 | 98.8 |
LOF | Density-based | 97.0 | 96.8 | 95.3 | 95.6 |
COF | Density-based | 94.0 | 94.5 | 96.1 | 95.1 |
K-means | Clustering-based | 94 | 92.46 | 92.3 | 93 |
DBSCAN | Clustering-based | 94.5 | 94.0 | 92.9 | 94.0 |
EM | Clustering-based | 95.0 | 93 | 92.9 | 94.0 |
PCA | Dimensionality-reduction | 97.8 | 97.0 | 96.7 | 97.0 |
KPCA | Dimensionality-reduction | 99.2 | 99.0 | 98.9 | 98.6 |
ICA | Dimensionality-reduction | 98.0 | 97.5 | 97.8 | 98.1 |
HBOS | Statistical-based | 94.4 | 93.9 | 95.8 | 96.0 |
One-class SVM | Classification-based | 99.4 | 99.2 | 99.2 | 99.5 |
IF | Classification-based | 99.9 | 99.8 | 99.6 | 99.7 |
K-NN | Distance-based | 98.2 | 97.8 | 97.6 | 97.7 |
ODIN | Distance-based | 99.2 | 98.9 | 98.5 | 98.6 |
LOF | Density-based | 96.9 | 95.8 | 95.1 | 95.0 |
COF | Density-based | 93.8 | 94.2 | 95.9 | 94.7 |
K-means | Clustering-based | 93.7 | 92.0 | 92.1 | 92.9 |
DBSCAN | Clustering-based | 94.2 | 93.8 | 92.7 | 93.5 |
EM | Clustering-based | 95.5 | 92.6 | 92.3 | 93.4 |
PCA | Dimensionality-reduction | 97.2 | 96.9 | 97.3 | 97.1 |
KPCA | Dimensionality-reduction | 99.1 | 98.5 | 98.5 | 98.7 |
ICA | Dimensionality-reduction | 97.7 | 97.4 | 97.1 | 97.3 |
HBOS | Statistical-based | 92.3 | 91.9 | 94.6 | 94.8 |
One-class SVM | Classification-based | 99.2 | 99.0 | 99.2 | 99.4 |
IF | Classification-based | 99.7 | 99.5 | 99.3 | 99.6 |