Predictive IoT-AI Model for Cyber Threat Detection in Smart Healthcare Environments
Kategoria artykułu: Article
Data publikacji: 15 cze 2024
Zakres stron: 17 - 31
Otrzymano: 02 lut 2024
Przyjęty: 11 maj 2024
DOI: https://doi.org/10.2478/jsiot-2024-0002
Słowa kluczowe
© 2023 Hemalatha K.L et al., published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
The rapid incorporation of Internet of Things (IoT) technologies in smart healthcare scenarios has revolutionized patient care, enabling real-time monitoring and efficient healthcare delivery. However, the interconnected nature of IoT devices has introduced critical security vulnerabilities, like unauthorized access, data breaches, and advanced persistent threats [1,2,3]. The security of IoT ecosystems, particularly in healthcare, where patient data is sensitive and critical, is of paramount importance. As such, the demand for advanced cyber threat detection mechanisms has grown significantly. In this study, we propose a Predictive IoT-AI Model for Cyber Threat Detection, which leverages the NSL-KDD dataset and a hybrid Deep Learning (DL) architecture combining CNN and GRU. CNNs are utilised to capture spatial characteristics from network traffic data, meanwhile GRUs effectively capture temporal dependencies and sequence patterns, enabling the detection of complex and evolving cyber threats [4]. The hybrid CNN-GRU model ensures a robust balance between accuracy and computational efficiency, designing it well-suited for real-time deployment in IoT-enabled healthcare systems.
Traditional approaches to cyber threat detection often based on static rule-based network, which struggle to adapt to the dynamic nature of IoT environments and evolving attack patterns. These systems are resource-intensive and ill-equipped to handle the diverse and constrained nature of IoT devices. The proposed predictive model addresses these challenges by combining the strengths of CNN and GRU architectures to present a dynamic, adaptive, and scalable solution for cyber threat detection [5,6,7]. Despite its strengths, the performance of deep learning-based approaches in IoT environments can be influenced by several factors, including device constraints, network conditions, and the complexity of attack patterns. Our hybrid CNN-GRU architecture is designed to overcome these limitations by optimizing computational efficiency and reducing false positives. This ensures the secure handling of sensitive healthcare data and enhances the resilience of IoT-based smart healthcare systems [8].
The proposed workflow of the Predictive IoT-AI Model includes Data Preprocessing is the Cleaning and transforming of network traffic data from the NSL-KDD dataset. Feature Extraction process using CNN layers to capture spatial patterns. Sequence Modelling employing GRU layers to detect temporal relationships and evolving attack trends. Threat Classification which it categorizes various types of cyber threats to facilitate proactive mitigation strategies [9,10,11,11,13]. The CNN-GRU hybrid architecture leverages the unique advantages of each component. CNNs extract robust spatial features, while GRUs provide sensitivity to sequential dependencies, enabling the model to adapt to subtle variations in attack patterns. This ensures high detection accuracy, even for sophisticated and zero-day threats.
The following segments explore the mathematical foundations of the CNN-GRU architecture, the preprocessing pipeline for the NSL-KDD dataset, and the experimental validation of the algorithm’s effectiveness. The results demonstrate the proposed algorithm’s superior accuracy, scalability, and computational efficiency. This research contributes to the advancement of IoT security by addressing the unique constraints of smart healthcare environments and offering a robust solution for cyber threat detection.
The research proposes a novel hybrid methodology integrating CNN and GRU to enhance the recognition of cyber threats in IoT-based smart healthcare environments.
The proposed model is rigorously evaluated using the NSL-KDD dataset and compared against traditional and state-of-the-art DL frameworks, showcasing significant enhancements in detection accuracy and computational effectiveness.
Extensive experiments are conducted with detailed performance evaluations utilising metrics such as accuracy, precision, recall, and F1-score, demonstrating the method’s robustness and suitability for real-time IoT cybersecurity applications.
The organisation of the paper is structured as: Section-2 explores the relavent works and literature, highlighting various methodologies for cyber threat detection in IoT-based healthcare systems. Section-3 delves into the preliminary concepts of CNN and GRU, explaining their integration into the proposed hybrid architecture for threat detection. Section-4 discusses a comprehensive detail of the NSL-KDD dataset, outlines the preprocessing steps, and presents the experimental setup along with an in-depth analysis of the results obtained using the proposed model. Lastly, Section-5 wraps up the research with an overview of the results and exploring potential future enhancements for further enhancing IoT security in smart healthcare environments.
Liu et al. (2023) [14] proposed a DL approach for detecting cyberattacks in IoT healthcare environments, utilizing advanced CNN and GRU architectures. Their model efficiently detects malicious activities and anomalies in real-time by leveraging spatial and temporal features. The system demonstrated high detection accuracy, making it suitable for IoT-based healthcare networks. However, its computational demands may present challenges when implemented on resource-constrained devices, requiring careful optimization for real-time deployment. The study emphasized the model’s adaptability in handling diverse and complex IoT network traffic. Their approach showcased the potential for early threat detection to mitigate risks in sensitive healthcare operations. Liu et al. also conducted extensive experiments to validate the robustness of the model against emerging cyber threats. They highlighted the importance of integrating such systems into broader IoT security frameworks. Despite its computational demands, the study suggests avenues for future research in lightweight model designs.
Chen et al. (2023) [15] designed a CNN-GRU-based intrusion detection system for smart healthcare IoT networks, combining convolutional layers for feature extraction and GRU layers for sequence learning. This hybrid approach potentially enhanced detection accuracy and reduced false positives relative to existing model. The model was effective in identifying cyber threats in real-time, transforming it into a valuable solution for healthcare IoT networks. However, the deep learning model’s resource consumption might hinder its implementation on low-power IoT devices, and further optimization is necessary for broader adoption. The research provided a detailed analysis of the dataset used, ensuring its relevance to real-world healthcare applications. The proposed system was validated on multiple attack scenarios, demonstrating its versatility. Chen et al. addressed the potential for integration with edge computing for real-time processing. Their results highlighted the importance of balancing performance with computational resource constraints. Future work suggested incorporating federated learning for enhanced privacy-preserving capabilities.
Zhou et al. (2023) [16] introduced an efficient CNN-GRU-based intrusion detection system designed specifically for IoT-enabled healthcare environments. Their approach achieved rapid detection of cybersecurity threats and minimized false alarms by effectively combining the benefits of CNN and GRU architectures. The system was shown to perform well in handling complex IoT data traffic patterns. However, the high processing power required for real-time implementation may limit its feasibility on devices with limited computational resources, requiring optimization techniques to balance performance and efficiency. The study explored the scalability of the system in large-scale IoT networks. Zhou et al. provided insights into the robustness of the model under adversarial conditions. The researchers proposed lightweight encryption techniques as a complementary security measure. They also suggested model compression techniques for improving deployment on low-resource devices. The findings reinforced the role of hybrid architectures in addressing evolving cybersecurity challenges.
Wang et al. (2022) [17] proposed a network intrusion detection model built upon CNN and GRU, aimed at improving the security of IoT systems. The model’s hybrid approach allowed for effective detection of both spatial and temporal features in IoT traffic, demonstrating a notable improvement in detecting cyber threats. However, like other deep learning-based systems, its high computational cost may present challenges when deploying on devices with limited resources. Optimizations may be needed to achieve a better balance between detection accuracy and real-time performance. The authors provided a thorough comparison with other state-of-the-art methods, highlighting the superior performance of their model. Their approach included innovative preprocessing techniques to enhance data quality for training. Wang et al. also emphasized the role of fine-tuning hyperparameters to optimize detection performance. The study included an analysis of scalability in dynamic IoT environments. Recommendations for future work focused on edge-based deployment strategies to address resource constraints.
Chen et al. (2022) [18] introduced a hybrid deep learning approach for intrusion detection in IoT networks, combining CNN and RNN architectures to effectively detect and classify malicious activities. The model showed superior performance in identifying threats in IoT environments, with reduced false positives compared to traditional methods. However, the model’s computational complexity may pose challenges for its deployment on devices with limited resources, requiring optimization for efficiency in real-time applications. The research highlighted the ability of hybrid models in enhancing the detection of zero-day attacks. Chen et al. utilized advanced feature selection methods to improve model efficiency. Their research highlighted the importance of robust training datasets for better generalization. The authors proposed using distributed processing systems to overcome computational limitations. The study also explored adaptive learning mechanisms to keep the model updated with evolving threat landscapes.
Sun et al. (2022) [19] focused on enhancing IoT security in healthcare using deep learning techniques. Their approach utilized CNNs and RNNs to detect and prevent cyber threats in healthcare IoT networks. The model demonstrated excellent performance in identifying anomalies and improving the overall security posture of healthcare systems. Despite its strengths, the computational demands of the model might hinder its deployment on low-power devices, and further optimization would be needed to ensure its scalability in real-time applications. The paper highlighted the ability of hybrid models in enhancing the detection of zero-day attacks. Chen et al. utilized advanced feature selection methods to improve model efficiency. Their research highlighted the importance of robust training datasets for better generalization. The authors proposed using distributed processing systems to overcome computational limitations. The study also explored adaptive learning mechanisms to keep the model updated with evolving threat landscapes.
Wang et al. (2022) [20] developed an intrusion detection system for IoT-based healthcare networks using CNN-GRU models. Their hybrid model was capable of accurately detecting cybersecurity threats by integrating the spatial feature extraction abilities of CNN with the temporal sequence modelling power of GRU. While the system showed improved detection performance, its heavy computational load may limit its practical use on resource-constrained IoT devices, and performance improvements through optimizations are needed for broader deployment. The study included a detailed analysis of hyperparameter tuning to achieve optimal results. Wang et al. validated their system on real-world IoT traffic datasets to ensure practical applicability. They proposed using model pruning techniques to reduce computational overhead. The authors also explored the potential for integrating their system with cloud-based monitoring platforms. The findings highlighted the need for continuous model updates to address new threat vectors.
Yang et al. (2022) [21] proposed a hybrid DL framework for cyber threat identification in smart healthcare environments, combining CNNs for feature extraction and GRUs for sequence analysis. The model effectively identified and mitigated potential cyber threats in real-time, achieving high accuracy. However, the computational intensity of deep learning models may pose challenges when deploying them on devices with limited processing power, requiring optimization to ensure efficient and scalable deployment in IoT healthcare networks. The study explored the use of lightweight alternatives to improve deployment feasibility. Their framework incorporated innovative techniques to enhance the interpretability of predictions. The researchers recommended edge-computing integration for real-time threat detection. Future work emphasized the need for collaboration with domain experts to refine the model further.

Hybrid Architecture for the Recommended Approach
This paper utilised the NSL-KDD dataset to assess the performance of the recommended cyber threat identification model. It is a refined version of the KDD Cup 1999 dataset, utilised in intrusion detection system (IDS). It comprises of network traffic data with labelled examples of both normal and attack traffic. It is divided into two main phases: the training dataset and the test dataset. Each instance in the dataset includes 41 features such as connection type, protocol type, service, and various statistics related to the network traffic. And it is categorized into four types of attack categories: DoS (Denial of Service), Probe (Probe attacks), R2L (Remote to Local), and U2R (User to Root). The NSL-KDD dataset includes 125,973 samples in the training dataset and 22,544 samples in the test dataset.
Figure 2 represents the distribution of different protocol types (e.g., TCP, UDP, ICMP) present in the NSL-KDD dataset. Figure 3 showcases the proportion of normal and attacked instances in the dataset. It provides a clear comparison of benign traffic versus malicious activities, emphasizing the imbalance or diversity in the dataset.

Visualization of the Protocol types of the Dataset

Visualization of the Normal and attacked flags
It encompasses organizing unprocessed information for handling missing values, normalizing features. These steps enhance data quality and ensure compatibility with machine learning models.
Categorical features like protocol, service, and flag were encoded utilising one-hot encoding. It converts categorical variables into binary vectors, which allows machine learning (ML) algorithms to understand them as numerical data.
Some features required manual categorization or recoding. For example, attack types were labeled as normal or categorized into different attack types, including DoS (Denial of Service), Probe, R2L (Remote to Local), and U2R (User to Root). This manual labelling ensures that the dataset is properly classified for training classification models.
After preprocessing, the dataset was divided into training and validation subsets. The training subset was utilized to train the algorithm, while the validation subset was employed to assess its efficacy. The split ratio was configured at 80% for training and 20% for validation..
CNN model is a powerful deep learning architecture tailored for automatic and effective spatial patterns learning and relationships from input data. For this specific study, the CNN model was adapted to analyze the NSL-KDD dataset, where each row of data represented a network traffic instance containing a combination of numerical and categorical features indicative of normal or malicious activities.
To process the dataset through the CNN, the raw features were preprocessed and transformed into a compatible format for convolutional operations. This transformation ensured that the model could interpret the features as spatially relevant inputs. Each instance was structured as a fixed-size numerical array, enabling the CNN for identifying local patterns and relationships in data.
The CNN begins by applying convolutional layers, which use multiple filters (kernels) to perform localized operations on the input data. These filters slide across the input, performing dot product operations to extract local features. These features might represent patterns such as correlations between certain network attributes or anomalies in traffic behaviour. Convolutional layers capture spatial hierarchies in the data, enabling the model to detect low and high level.
After each convolution operation, an activation function ReLU, i.e., Rectified Linear Unit is utilized to present non-linearity: This ensures that the network can learn complex forms and relationships within the data.
After the convolutional layers, pooling layers (usually max pooling) are employed to downsample the feature maps, decreasing their dimensions while retaining the most important features. Pooling helps in attaining spatial invariance, reducing computation, and preventing overfitting
The features extracted by the above layers are flattened and fed into fully connected layers. These layers act as a high-level classifier, combining the learned spatial patterns to assign probabilities to different output classes (e.g., normal or attack traffic).
To enhance the generalization of the CNN method and reduce overfitting, dropout layers were incorporated. Dropout randomly sets a portion of the layer’s output neurons is set to zero during training, compelling the model to acquire resilient pattern features.
This layer produces the classification result, indicating whether the network instance is benign (normal) or a specific type of attack. The CNN’s design ensures that it captures intricate patterns in network traffic, effectively distinguishing between normal and malicious activity.

CNN Architecture
It is a type of Recurrent Neural Network (RNN), was utilised to capture sequential dependencies in the data. The GRU model is particularly well-suited to handle time-series data, as it allows the model to “remember” previous states using gating mechanisms. This was beneficial for detecting attacks that might emerge from sequential patterns of network traffic.
The GRU model’s architecture includes:
GRU layers: To process temporal relationships between consecutive traffic instances.
Fully connected layers: For the final classification after learning sequential patterns.
The reset gate controls how much of the previous hidden state should be ignored. It determines the degree to which the previous state is relevant for the current time step.
Where rt is the reset gate at time step t, σ is the
It controls how much information from the previous hidden state would be transferred on to the current state. If rt is close to 0, the model effectively “resets” and ignores the previous hidden state. If rt is close to 1, it retains the previous hidden state.
It controls how much of the previous hidden state could be maintained. It decides how much of the candidate hidden state could be blended with the previous hidden state to form the current hidden state.
Where
The update gate

Architecture for GRU
This hybrid model integrates the potentials of both CNN and GRU to handle both spatial and temporal dependencies in data. CNNs are effective at extracting spatial features from input data, including images or sequences. The CNN layers consist of convolutional operations that apply filters (kernels) to the input, followed by activation functions like ReLU and pooling layers to down-sample the spatial dimensions. The output from the CNN layers is a set of feature maps representing high-level spatial information.
After feature extraction by the CNN, the output is fed into the GRU layers, which are developed to obtain the temporal dependencies in the data. It utilizes gating mechanisms, including the reset and update gates, to decide which information should be retained across time steps. This allows the model to handle sequential data efficiently.
The integration of CNN for spatial feature extraction and GRU for temporal modelling enables the hybrid model to process both types of dependencies in the input data. The final output layer of the model typically comprises of a fully connected layer that performs classification or regression, depending on the specific task.
The hybrid CNN + GRU model has various benefits, which includes the potential to handle both spatial and temporal patterns in tasks like video analysis, time-series forecasting, and anomaly detection. However, it also has some drawbacks. The model requires more computational resources due to its increased complexity compared to using CNNs or GRUs alone. Additionally, with a larger number of parameters, the model may be prone to overfitting if not properly regularized or if the dataset is small.
In summary, the CNN + GRU hybrid model is a powerful architecture for applications where both spatial and temporal features are crucial. It is particularly suitable for tasks which has sequential or time-series information with spatial components, such as medical data analysis, video classification, or sensor data processing.
The model was developed utilising Python3.19 programming and libraries including matplotlib, numpy, pandas, Scikit-Learn, seaborn are utilized for evaluating the proposed model. The experimentation was carried out in the PC workstation with i7 CPU, 3.2 GHZ operating frequency, 16GB RAM, NVIDIA Tesla GPU.
It was employed to assess the effectiveness of the recommended CNN-GRU hybrid algorithm. Precision calculates the ratio of accurately categorized instances from the entire sample size in the dataset. It provides a general understanding of how effectively the algorithm performs in distinguishing both legitimate and harmful network traffic. Precision quantifies the proportion of correctly identified attack instances out of all instances predicted as attacks. It indicates the model’s ability to avoid false alarms (i.e., false positives). Recall evaluates the fraction of true attack occurrences that were accurately recognized by the model. It emphasizes the model’s capacity to identify all pertinent instances of attacks. The F1-score is the harmonic mean of precision and recall. It manages the trade-off among these two metrics and provides a single measure of the algorithm’s effectiveness, especially useful for imbalanced datasets.
Performance measures utilized for assessment
SL.NO | Evaluation Metrics | Mathematical Expression |
---|---|---|
01 | Accuracy |
|
02 | Recall |
|
03 | Specificity |
|
04 | Precision |
|
05 | F1-Score |
|
Prediction cases are categorized into four types: The first is True Positive (TP), where a value is correctly recognised as true, and it is indeed true. The second is False Positive (FP), where a value is incorrectly recognised as true, despite being false. The third is False Negative (FN), where a value is true but is mistakenly recognised as negative. Finally, the fourth is True Negative (TN), where a value is negative and is accurately recognised as such.
Table 2 presents a comparative analysis of various DL algorithm based on key performance metrics like accuracy, sensitivity, specificity, precision, and F1-score. The recommended hybrid approach outperforms the baseline models, including Recurrent Neural Networks (RNN), CNN, GRU and Long Short-Term Memory (LSTM), The CNN and LSTM and the proposed. With an accuracy of 98.2%, the proposed model exhibits exceptional results across all metrics, highlighting its robustness from the NSL-KDD dataset.
Comparative Analysis between the Different Models
Algorithms / Performance Metrics | ACCURACY | PRECISION | RECALL | F1 SCORE |
---|---|---|---|---|
RNN | 93.3 | 90.1 | 89.4 | 91 |
CNN | 94.3 | 92.1 | 91.4 | 91.7 |
GRU | 92.8 | 91.3 | 90.6 | 90.9 |
LSTM | 93 | 92.4 | 92 | 90.5 |
CNN+LSTM | 96.2 | 94.8 | 94.1 | 94.4 |
PROPOSED | 98.2 | 98.1 | 96 | 96.8 |
Figure 6 illustrates the comparative analysis of testing accuracy among various deep learning models. It highlights the consistent superiority of the proposed hybrid CNN-GRU model over other architectures, with a significant margin in performance accuracy

Performance Metrics Visualization
The evolving landscape of IoT-enabled smart healthcare environments, safeguarding sensitive data and ensuring robust cyber threat detection are paramount to maintaining the integrity and reliability of critical systems. This paper presents a Predictive IoT-AI Model for Cyber Threat Detection, which leverages a hybrid CNN-GRU architecture to leverage the unique security limitations addressed by IoT-based healthcare systems. By integrating CNN and GRU algorithm, the proposed framework demonstrates high adaptability to dynamic and evolving cyber threats. The model operates in key phases, including data preprocessing, feature extraction, and threat classification, to deliver real-time and accurate cyber threat detection. Extensive experiments using the NSL-KDD dataset validate the approach efficiency, achieving potential enhancements in accuracy, precision, and recall compared to traditional approaches. The hybrid architecture ensures computational efficiency, making it feasible for deployment on resource-constrained IoT devices in healthcare environments.
The results highlight the performance of the recommended framework, attaining a commendable accuracy score of 98.2 % and demonstrating its capability to balance robust security with low resource consumption. This research not only contributes to advancing cybersecurity in smart healthcare but also provides a scalable, efficient, and adaptive solution to address the ever-increasing threat landscape in IoT ecosystems.
The framework can be extended by incorporating advanced optimization techniques to further enhance model performance and adapt to emerging cyber-attack patterns. Additionally, integrating federated learning or edge-based computing strategies can improve scalability and privacy preservation, enabling secure deployment across diverse IoT healthcare infrastructures