Application of Deep Neural Networks in Multi-Hop Wireless Sensor Network (WSN) Channel Optimization
Publicado en línea: 11 abr 2025
Recibido: 04 dic 2024
Aceptado: 08 mar 2025
DOI: https://doi.org/10.2478/amns-2025-0848
Palabras clave
© 2025 Yiyang Chen, published by Sciendo
This work is licensed under the Creative Commons Attribution 4.0 International License.
Multi-hop wireless sensor networks (WSNs) play a critical role in modern communication systems, particularly in applications requiring extensive coverage and energy-efficient data transmission. However, optimizing channel allocation and routing in multi-hop WSNs remains a significant challenge due to dynamic network topologies, interference, and energy constraints. Traditional optimization methods, including heuristic and metaheuristic approaches, have been widely applied but often struggle with real-time adaptability and high-dimensional feature spaces. Recent advancements in deep learning have opened new possibilities for enhancing WSN channel optimization, leveraging data-driven techniques to improve network performance [1].
Hierarchical optimization techniques have been explored for clustering and multi-hop routing to enhance energy efficiency in underwater WSNs [2]. Deep learning-aided optimization has demonstrated its potential in short-packet multi-hop networks, improving overall performance through adaptive learning models [3]. Predictive models based on hop count theory have also been developed using deep learning, providing more accurate routing decisions in WSN environments [4]. The integration of deep neural networks (DNNs) in localization techniques has further improved the reliability of sensor networks in the Internet of Things (IoT) applications [5]. Additionally, hybrid deep recurrent neural networks have been applied to optimal cluster-based routing, ensuring coverage-aware communication in large-scale WSNs [6].
Multi-hop routing strategies have been enhanced using intelligent next-hop selection mechanisms, improving network stability and efficiency in IoT-enabled WSNs [7]. Furthermore, multi-objective optimization algorithms have been employed to optimize multi-hop and multi-path routing, ensuring balanced energy consumption and robust data transmission [8]. Deep learning-based cooperative communication models have been proposed for underground WSNs, highlighting the benefits of data-driven approaches in challenging deployment environments [9]. Hybrid metaheuristic optimization techniques, such as chimp and hunger games search algorithms, have also been utilized to optimize multi-hop routing protocols in energy-efficient WSNs [10]. Similarly, search-and-rescue-based chaotic optimization methods have been applied to underwater WSNs, enabling more reliable multi-hop data transmission in extreme conditions [11].
Reinforcement learning has been increasingly adopted for multi-parameter joint optimization in dense multi-hop networks, improving decision-making in dynamic channel conditions [12]. Deep neural networks have also been leveraged for anchor-free localization in WSNs, eliminating the need for predefined reference points in multi-sink networks [13]. Hierarchical traffic offloading mechanisms have been proposed to ensure end-to-end reliability in multi-hop, multi-connection WSNs, addressing congestion and delay challenges [14]. Additionally, deep convolutional neural networks combined with metaheuristic algorithms have demonstrated improved energy-efficient routing in WSN-IoT integrated environments [15]. The use of CNN-based models for optimal cluster head selection further reinforces the role of deep learning in enhancing energy efficiency and prolonging network lifespan [16].
Optimization techniques such as the Gray Wolf optimizer have been integrated with distance vector-hop (DV-Hop) methods to improve node localization in WSNs [17]. Combined metaheuristic approaches, such as the Osprey-Chimp optimization algorithm, have been applied to cluster-based routing, achieving improved energy prediction with deep learning models [18]. Cognitive radio sensor networks have also benefited from deep learning techniques for optimizing network lifetime in industrial applications [19]. Machine learning-based approaches have been extensively reviewed for coverage optimization in WSNs, highlighting the growing relevance of artificial intelligence in sensor network optimization [20]. Additionally, relay selection strategies have been optimized for energy-efficient cooperative multi-hop image transmission, ensuring robust multimedia communication in WSNs [21].
Despite these advancements, optimizing channel allocation in multi-hop WSNs remains a complex problem due to varying interference levels, mobility patterns, and dynamic traffic conditions. Traditional optimization techniques often lack the flexibility needed to adapt to real-time network variations. This paper explores the application of deep neural networks to WSN channel optimization, integrating reinforcement learning and convolutional architectures to learn dynamic transmission policies. By leveraging deep learning-based feature extraction and predictive modeling, the proposed framework aims to improve throughput, energy efficiency, and network stability under diverse operational conditions. Through experimental validation, we demonstrate the superiority of the proposed approach over conventional optimization methods, highlighting its potential for real-time, intelligent WSN channel optimization.
This section presents the proposed deep neural network (DNN)-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). The method integrates reinforcement learning (RL) with convolutional and recurrent neural networks to dynamically adjust transmission policies based on real-time network conditions. We describe the overall framework, mathematical formulations, and optimization process, followed by illustrative figures.
The proposed approach consists of three main components: (i) feature extraction using convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, (ii) reinforcement learning-based decision-making for adaptive routing and channel allocation, and (iii) an optimization mechanism for energy-efficient network performance. Figure 1 illustrates the overall workflow.

Proposed deep learning-based channel and routing optimization framework. Feature extraction captures spatial-temporal variations, reinforcement learning optimizes policy decisions, and final routing and channel selection are dynamically adjusted.
To effectively capture spatial and temporal variations in network state, we employ a hybrid deep learning architecture combining convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. CNNs extract spatial dependencies among sensor nodes, while LSTMs model the temporal evolution of network conditions, enabling the system to adapt dynamically to changes in channel quality, interference, and traffic load.
At each time step, the network state is represented as a multi-channel tensor
The CNN module applies multiple convolutional layers to extract spatial patterns from
To enhance feature learning, batch normalization and max pooling operations are applied:
This ensures stable training and reduces computational complexity.
The extracted spatial features
The LSTM outputs a context-aware embedding
The final feature representation
This hybrid feature embedding is then passed to the reinforcement learning agent for adaptive routing and channel allocation.
To dynamically optimize routing and channel allocation in multi-hop wireless sensor networks (WSNs), we formulate the problem as a reinforcement learning (RL) task. The RL agent continuously learns from the network environment and selects optimal transmission policies to maximize network efficiency while minimizing interference and energy consumption. The decision-making process is modeled as a Markov Decision Process (MDP), where the RL agent interacts with the environment through iterative exploration and exploitation.
The optimization problem is represented as a tuple (
The state
The state representation is obtained from the CNN-LSTM feature extraction module, which provides a spatiotemporal embedding of the network topology.
At each time step, the RL agent selects an action
The agent learns an adaptive routing and channel allocation policy, balancing data transmission efficiency and network longevity.
The agent’s objective is to maximize a cumulative reward function that considers throughput, energy efficiency, and latency:
A penalty term is added if the selected action leads to excessive congestion or energy depletion in critical nodes:
The RL agent updates its policy using Q-learning, where the action-value function
The policy
To enhance learning stability and prevent catastrophic forgetting, experience replay is employed. A replay buffer stores past experiences (
Figure 2 illustrates the reinforcement learning-based routing and channel allocation process.

Reinforcement learning-based policy optimization process. The RL agent iteratively updates policies based on observed network states and reward feedback.
The reinforcement learning-based approach enables dynamic and adaptive optimization of routing and channel allocation in multi-hop WSNs. By leveraging Q-learning with experience replay, the model continuously improves its decision-making policy, leading to enhanced network throughput, energy efficiency, and reduced congestion. The next section presents experimental evaluations demonstrating the effectiveness of the proposed method.
The RL-based optimization updates policy parameters using the Q-learning algorithm:
The learned policy is periodically fine-tuned using experience replay and policy gradient updates.
The proposed method integrates deep learning with reinforcement learning for optimizing channel allocation and routing in multi-hop WSNs. CNN and LSTM networks extract spatial-temporal features from network states, while an RL agent dynamically learns optimal transmission policies. The approach balances throughput, energy efficiency, and latency through a carefully designed reward function and policy update mechanism. Experimental validation of this method will be presented in the following section.
This section presents a series of experiments designed to evaluate the effectiveness of the proposed deep reinforcement learning-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). The experiments assess the model’s performance in terms of (i) network throughput and latency, (ii) energy efficiency, and (iii) robustness against interference and dynamic traffic variations. The evaluation compares the proposed method with baseline techniques to demonstrate its advantages in real-world deployment scenarios.
To simulate a realistic multi-hop WSN, a network topology consisting of 100 sensor nodes randomly deployed over a 500
The proposed deep reinforcement learning (DNN-RL) model is implemented using a hybrid architecture combining convolutional neural networks (CNNs) and long short-term memory (LSTM) networks for feature extraction. The RL agent is trained using the Q-learning algorithm with an experience replay buffer of 104 state-action pairs. The learning rate is set to
For comparison, two baseline methods are used: - Q-RL: A traditional reinforcement learning approach based on tabular Q-learning without deep feature extraction. - Heuristic Method (HM): A conventional routing and channel allocation strategy that selects the shortest path with the least congested channel.
Each experiment is repeated 10 times with different random seeds to ensure statistical reliability.
This experiment measures the network throughput and end-to-end latency under different traffic loads. Throughput is defined as the number of successfully delivered packets per second, while latency represents the average time required for a packet to reach the sink node.
Figure 3 illustrates the network throughput for increasing packet arrival rates. The proposed DNN-RL model consistently achieves higher throughput compared to the baseline methods. At moderate traffic loads (30 packets/sec), DNN-RL achieves a throughput of approximately 6.5 Mbps, which is 12% higher than Q-RL and 30% higher than HM. Under high traffic conditions (80 packets/sec), the performance gap widens as DNN-RL efficiently adapts to congestion by dynamically adjusting channel allocation.

Comparison of network throughput for different methods. The proposed DNN-RL model maintains higher throughput under increasing network loads.
The results of Experiment 1 indicate that the proposed DNN-RL approach significantly improves network throughput and reduces latency compared to traditional methods. As shown in Figure 3, the throughput of the DNN-RL model remains consistently higher across different network loads. When the traffic load reaches 50 packets per second, DNN-RL achieves a throughput of approximately 9.1 Mbps, which is 11% higher than Q-RL (8.2 Mbps) and 25% higher than HM (7.3 Mbps). The performance gap becomes more pronounced under higher loads, where the adaptive channel selection of DNN-RL prevents congestion and maximizes available bandwidth.
Latency measurements further validate the efficiency of the proposed method. Under high traffic conditions, DNN-RL reduces the average packet delay by 20% compared to Q-RL and 35% compared to HM. This improvement is attributed to the model’s ability to dynamically reallocate channels and reroute packets in response to congestion hotspots. In contrast, the heuristic method follows a static shortest-path routing approach, leading to bottlenecks in high-load scenarios.
Overall, these results demonstrate that deep reinforcement learning effectively optimizes multi-hop WSN performance by adapting to real-time network conditions. By learning optimal transmission policies, the proposed approach maintains high throughput and minimizes latency, ensuring reliable data delivery in dynamic environments.
Energy efficiency is a critical factor in WSNs, as sensor nodes typically operate on limited battery power. This experiment evaluates the energy consumption per successfully delivered packet and estimates network lifetime under different optimization strategies. The goal is to determine how effectively the proposed DNN-RL method reduces energy usage compared to traditional approaches.
Table 1 presents the average energy consumption per packet and the estimated network lifetime for each method. The results indicate that the proposed DNN-RL model achieves the lowest energy consumption per packet, consuming only 0.12 mJ, which is 20% lower than Q-RL and 30% lower than the heuristic method (HM). The prolonged network lifetime observed in DNN-RL confirms its effectiveness in optimizing power-aware routing and channel allocation.
Comparison of Energy Efficiency
Method | Energy per Packet (mJ) | Network Lifetime (hours) |
---|---|---|
DNN-RL (Proposed) | 0.12 | 145.3 |
Q-RL (Baseline) | 0.15 | 132.1 |
Heuristic Method (HM) | 0.21 | 98.7 |
To visualize the energy consumption trends, Figure 4 illustrates the total energy depletion over time for different methods. The heuristic method (HM) exhibits the steepest decline, indicating rapid battery depletion. In contrast, DNN-RL maintains a more gradual energy depletion rate, allowing for a significantly longer operational period.

Energy depletion over time for different optimization methods. The proposed DNN-RL approach maintains a more gradual depletion rate, ensuring extended network lifetime.
The experimental results demonstrate that the proposed deep reinforcement learning-based optimization significantly reduces energy consumption compared to traditional methods. The heuristic method (HM) leads to faster battery depletion due to its rigid routing decisions, which fail to adapt dynamically to network conditions. The Q-RL model achieves moderate improvements, but it lacks the deep feature extraction capabilities of DNN-RL, which enable more efficient power-aware routing and adaptive channel selection.
DNN-RL prolongs network lifetime by approximately 47% compared to HM and 10% compared to Q-RL. The gradual depletion rate in Figure 4 confirms that the reinforcement learning-based model effectively balances energy-efficient transmissions, optimizing power consumption across the network.
These findings highlight the importance of deep learning in enhancing the sustainability of WSN deployments. By leveraging real-time decision-making, the proposed model ensures that energy resources are utilized optimally, preventing premature node failures and extending network longevity.
This experiment tests the model’s robustness against dynamically changing interference and varying traffic conditions. External interference sources are introduced randomly, and the packet delivery ratio (PDR) is monitored.
Experiment 3 evaluates the resilience of different optimization methods against dynamic interference and fluctuating traffic loads. Figure 5 illustrates that the proposed DNN-RL approach maintains a significantly higher packet delivery ratio (PDR) under increasing interference levels compared to Q-RL and HM. At moderate interference levels, DNN-RL achieves a PDR of approximately 94%, whereas Q-RL and HM degrade to 90% and 85%, respectively. As interference intensifies, the performance gap widens, with HM dropping below 65% PDR at the highest interference levels, while DNN-RL still maintains over 80% delivery reliability.

Packet delivery ratio (PDR) under varying interference levels. The proposed method maintains a higher PDR, demonstrating robustness against interference.
These findings suggest that deep reinforcement learning enhances the robustness of WSNs by dynamically adapting routing and channel selection policies. The traditional heuristic approach suffers significant performance degradation in highly dynamic environments because it lacks adaptability, leading to excessive packet loss and inefficient channel utilization. In contrast, the DNN-RL model continuously learns from changing network conditions, allowing it to mitigate the impact of interference and maintain stable network performance.
The results confirm that the proposed method is well-suited for real-world WSN deployments where environmental conditions and traffic loads are unpredictable. By leveraging real-time decision-making, DNN-RL ensures reliable communication even in challenging network conditions, making it a promising solution for mission-critical applications such as industrial monitoring and emergency response systems.
The experimental results demonstrate that the proposed deep reinforcement learning (DNN-RL) approach effectively enhances multi-hop wireless sensor network (WSN) performance across various key metrics, including throughput, latency, energy efficiency, and robustness against interference. The comparison with traditional heuristic and reinforcement learning-based approaches highlights the advantages of integrating deep learning for feature extraction and adaptive decision-making in dynamic network environments.
The first experiment showed that DNN-RL achieves significantly higher throughput and lower latency compared to Q-learning-based and heuristic methods. This improvement stems from the model’s ability to dynamically allocate channels and optimize routing paths based on real-time network conditions. Unlike the heuristic method, which follows a static routing strategy, the reinforcement learning agent continuously updates its policy based on observed network performance, leading to more efficient packet transmission. Additionally, the use of CNN-LSTM feature extraction enables the agent to capture both spatial and temporal variations in network traffic, allowing for better congestion management. The lower latency observed in DNN-RL confirms that it successfully mitigates transmission delays by dynamically rerouting packets away from congested areas.
In terms of energy efficiency, Experiment 2 revealed that DNN-RL significantly reduces energy consumption per packet compared to the baseline methods. The reinforcement learning framework optimizes power-aware routing decisions by balancing energy distribution across sensor nodes, preventing early depletion of critical network resources. The results indicate that the proposed approach extends network lifetime by approximately 47% compared to heuristic methods and 10% compared to Q-learning. This efficiency is primarily attributed to the model’s ability to learn energy-efficient transmission policies, reducing unnecessary retransmissions and idle listening, which are major contributors to energy wastage in WSNs. The visualization of energy depletion over time further confirms that the proposed model distributes power consumption more evenly across the network, reducing the likelihood of network partitioning due to node failures.
The third experiment evaluated the robustness of DNN-RL in the presence of dynamic interference and fluctuating traffic conditions. The results show that the proposed model maintains a higher packet delivery ratio (PDR) than both Q-RL and heuristic methods, particularly under high interference levels. This resilience is due to the model’s ability to adaptively reassign channels and reroute packets to minimize packet collisions and signal degradation. The heuristic method, which does not incorporate real-time interference feedback, suffers from significant performance degradation as interference increases, leading to a PDR drop below 65%. In contrast, the reinforcement learning model successfully learns optimal channel selection policies that maximize delivery reliability, ensuring stable communication even in harsh network environments.
The advantages of the proposed method stem from its integration of deep learning and reinforcement learning. Unlike traditional Q-learning approaches, which struggle with large state spaces and slow convergence, DNN-RL leverages CNN-LSTM architectures to extract meaningful features from raw network state data, reducing the complexity of the decision-making process. This results in faster policy convergence and more effective adaptation to changing network conditions. Furthermore, the experience replay mechanism ensures that past experiences are utilized efficiently, improving the model’s stability and generalization across different network scenarios.
Despite these advantages, the proposed method has several limitations. First, the computational complexity of training deep reinforcement learning models is significantly higher than that of traditional heuristic methods. The model requires substantial training data and computational resources to learn optimal policies, which may be a limiting factor in real-time, resource-constrained WSN applications. Second, while the CNN-LSTM architecture enhances feature extraction, it may not fully capture abrupt environmental changes, such as sudden node failures or extreme interference spikes. Future work could explore hybrid approaches that combine reinforcement learning with real-time anomaly detection mechanisms to further improve adaptability. Additionally, fine-tuning the reward function remains a challenge, as different application scenarios may require different trade-offs between throughput, energy efficiency, and latency.
Overall, the results confirm that deep reinforcement learning is a promising approach for optimizing multi-hop WSNs, offering significant improvements in network performance while maintaining energy efficiency and robustness. Future research directions could focus on reducing the computational cost of training by implementing distributed learning techniques, as well as extending the model to support real-time network reconfiguration in highly dynamic environments.
This paper proposed a deep reinforcement learning-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). By integrating convolutional and recurrent neural networks for feature extraction with reinforcement learning-based policy optimization, the proposed framework effectively adapts to dynamic network conditions, improving throughput, latency, energy efficiency, and robustness against interference. Experimental results demonstrated that the DNN-RL model achieves significantly higher throughput and lower latency compared to traditional heuristic and Q-learning-based methods, particularly under high traffic loads. The energy efficiency analysis confirmed that the model reduces power consumption per packet by up to 30% compared to conventional approaches, leading to an extended network lifetime. Additionally, the proposed method demonstrated strong resilience to interference, maintaining a higher packet delivery ratio (PDR) even under severe network disruptions. These advantages stem from the model’s ability to dynamically adjust transmission policies based on real-time network observations, enabling efficient and adaptive decision-making. Despite its strengths, the computational complexity of training deep reinforcement learning models remains a challenge, and further research is needed to enhance real-time adaptability and reduce resource requirements. Future work could explore distributed training methods, hybrid optimization approaches, and real-time anomaly detection to further enhance model performance. Overall, this study highlights the potential of deep learning-based optimization techniques for enhancing WSN performance, paving the way for more intelligent and energy-efficient network management solutions.