Application of Deep Neural Networks in Multi-Hop Wireless Sensor Network (WSN) Channel Optimization

Multi-hop wireless sensor networks (WSNs) play a critical role in modern communication systems, particularly in applications requiring extensive coverage and energy-efficient data transmission. However, optimizing channel allocation and routing in multi-hop WSNs remains a significant challenge due to dynamic network topologies, interference, and energy constraints. Traditional optimization methods, including heuristic and metaheuristic approaches, have been widely applied but often struggle with real-time adaptability and high-dimensional feature spaces. Recent advancements in deep learning have opened new possibilities for enhancing WSN channel optimization, leveraging data-driven techniques to improve network performance [1].

Hierarchical optimization techniques have been explored for clustering and multi-hop routing to enhance energy efficiency in underwater WSNs [2]. Deep learning-aided optimization has demonstrated its potential in short-packet multi-hop networks, improving overall performance through adaptive learning models [3]. Predictive models based on hop count theory have also been developed using deep learning, providing more accurate routing decisions in WSN environments [4]. The integration of deep neural networks (DNNs) in localization techniques has further improved the reliability of sensor networks in the Internet of Things (IoT) applications [5]. Additionally, hybrid deep recurrent neural networks have been applied to optimal cluster-based routing, ensuring coverage-aware communication in large-scale WSNs [6].

Multi-hop routing strategies have been enhanced using intelligent next-hop selection mechanisms, improving network stability and efficiency in IoT-enabled WSNs [7]. Furthermore, multi-objective optimization algorithms have been employed to optimize multi-hop and multi-path routing, ensuring balanced energy consumption and robust data transmission [8]. Deep learning-based cooperative communication models have been proposed for underground WSNs, highlighting the benefits of data-driven approaches in challenging deployment environments [9]. Hybrid metaheuristic optimization techniques, such as chimp and hunger games search algorithms, have also been utilized to optimize multi-hop routing protocols in energy-efficient WSNs [10]. Similarly, search-and-rescue-based chaotic optimization methods have been applied to underwater WSNs, enabling more reliable multi-hop data transmission in extreme conditions [11].

Reinforcement learning has been increasingly adopted for multi-parameter joint optimization in dense multi-hop networks, improving decision-making in dynamic channel conditions [12]. Deep neural networks have also been leveraged for anchor-free localization in WSNs, eliminating the need for predefined reference points in multi-sink networks [13]. Hierarchical traffic offloading mechanisms have been proposed to ensure end-to-end reliability in multi-hop, multi-connection WSNs, addressing congestion and delay challenges [14]. Additionally, deep convolutional neural networks combined with metaheuristic algorithms have demonstrated improved energy-efficient routing in WSN-IoT integrated environments [15]. The use of CNN-based models for optimal cluster head selection further reinforces the role of deep learning in enhancing energy efficiency and prolonging network lifespan [16].

Optimization techniques such as the Gray Wolf optimizer have been integrated with distance vector-hop (DV-Hop) methods to improve node localization in WSNs [17]. Combined metaheuristic approaches, such as the Osprey-Chimp optimization algorithm, have been applied to cluster-based routing, achieving improved energy prediction with deep learning models [18]. Cognitive radio sensor networks have also benefited from deep learning techniques for optimizing network lifetime in industrial applications [19]. Machine learning-based approaches have been extensively reviewed for coverage optimization in WSNs, highlighting the growing relevance of artificial intelligence in sensor network optimization [20]. Additionally, relay selection strategies have been optimized for energy-efficient cooperative multi-hop image transmission, ensuring robust multimedia communication in WSNs [21].

Despite these advancements, optimizing channel allocation in multi-hop WSNs remains a complex problem due to varying interference levels, mobility patterns, and dynamic traffic conditions. Traditional optimization techniques often lack the flexibility needed to adapt to real-time network variations. This paper explores the application of deep neural networks to WSN channel optimization, integrating reinforcement learning and convolutional architectures to learn dynamic transmission policies. By leveraging deep learning-based feature extraction and predictive modeling, the proposed framework aims to improve throughput, energy efficiency, and network stability under diverse operational conditions. Through experimental validation, we demonstrate the superiority of the proposed approach over conventional optimization methods, highlighting its potential for real-time, intelligent WSN channel optimization.

2

Method

This section presents the proposed deep neural network (DNN)-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). The method integrates reinforcement learning (RL) with convolutional and recurrent neural networks to dynamically adjust transmission policies based on real-time network conditions. We describe the overall framework, mathematical formulations, and optimization process, followed by illustrative figures.

2.1

Framework Overview

The proposed approach consists of three main components: (i) feature extraction using convolutional neural networks (CNNs) and long short-term memory (LSTM) networks, (ii) reinforcement learning-based decision-making for adaptive routing and channel allocation, and (iii) an optimization mechanism for energy-efficient network performance. Figure 1 illustrates the overall workflow.

2.2

Feature Extraction Using CNN and LSTM

To effectively capture spatial and temporal variations in network state, we employ a hybrid deep learning architecture combining convolutional neural networks (CNNs) and long short-term memory (LSTM) networks. CNNs extract spatial dependencies among sensor nodes, while LSTMs model the temporal evolution of network conditions, enabling the system to adapt dynamically to changes in channel quality, interference, and traffic load.

At each time step, the network state is represented as a multi-channel tensor S_t ∈ ℝ^M×N×C, where: - M and N correspond to the spatial dimensions of the network topology. - C represents different input features, such as signal-to-noise ratio (SNR), residual node energy, interference level, and traffic congestion.

The CNN module applies multiple convolutional layers to extract spatial patterns from S_t. The output feature map is given by: $F_{t} = σ (W_{c} * S_{t} + b_{c})$ where: - W_c and b_c are the convolutional filter weights and biases, - ∗ denotes the convolution operation, - σ(·) is a nonlinear activation function (e.g., ReLU).

To enhance feature learning, batch normalization and max pooling operations are applied: $F_{t}' = \max_{ω \times ω} B a t c h N o r m (F_{t})$

This ensures stable training and reduces computational complexity.

The extracted spatial features F_t′ are then processed by an LSTM network to capture temporal dependencies. The LSTM updates its hidden state according to: $h_{t} = \tanh (W_{h} F_{t}' + U_{h} h_{t - 1} + b_{h})$ where: - h_t is the hidden state at time t, - W_h, U_h, b_h are trainable parameters.

The LSTM outputs a context-aware embedding H_t, which encodes long-term dependencies in network conditions.

The final feature representation Z_t is obtained by concatenating CNN-extracted spatial features and LSTM-extracted temporal dependencies: $Z_{t} = Concat (F_{t}', H_{t})$

This hybrid feature embedding is then passed to the reinforcement learning agent for adaptive routing and channel allocation.

2.3

Reinforcement Learning-Based Routing and Channel Allocation

To dynamically optimize routing and channel allocation in multi-hop wireless sensor networks (WSNs), we formulate the problem as a reinforcement learning (RL) task. The RL agent continuously learns from the network environment and selects optimal transmission policies to maximize network efficiency while minimizing interference and energy consumption. The decision-making process is modeled as a Markov Decision Process (MDP), where the RL agent interacts with the environment through iterative exploration and exploitation.

The optimization problem is represented as a tuple (S, A, P, R, γ), where: - S is the state space, representing the current network conditions. - A is the action space, consisting of routing decisions and channel selection strategies. - P(s′ | s, a) is the state transition probability, defining the likelihood of moving to state s′ after taking action in state s. - R(s, a) is the reward function, quantifying the effectiveness of an action in optimizing network performance. - γ ∈ (0,1) is the discount factor, controlling the importance of future rewards.

The state s_t at time t captures the real-time network conditions, including: - Residual energy of sensor nodes. - Link quality metrics such as signal-to-noise ratio (SNR) and interference levels. - Network traffic load and congestion indicators. - Hop count and latency metrics.

The state representation is obtained from the CNN-LSTM feature extraction module, which provides a spatiotemporal embedding of the network topology.

At each time step, the RL agent selects an action a_t from the action space A, which includes: - Selecting the next-hop node for packet forwarding. - Assigning an appropriate communication channel to minimize interference.

The agent learns an adaptive routing and channel allocation policy, balancing data transmission efficiency and network longevity.

The agent’s objective is to maximize a cumulative reward function that considers throughput, energy efficiency, and latency: $R (s_{t}, a_{t}) = α T_{t} - β E_{t} - γ L_{t}$ where: - T_t represents the network throughput at time t. - E_t denotes the total energy consumption of the network. - L_t is the end-to-end latency. - α, β, γ are weighting coefficients that balance different optimization objectives.

A penalty term is added if the selected action leads to excessive congestion or energy depletion in critical nodes: $R (s_{t}, a_{t}) \leftarrow R (s_{t}, a_{t}) - δ C_{t}$ where C_t represents congestion at time t and δ is a penalty coefficient.

The RL agent updates its policy using Q-learning, where the action-value function Q(s, a) is iteratively updated based on observed rewards: $Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + η (r_{t} + λ \max_{a'} Q (s_{t + 1}, a') - Q (s_{t}, a_{t}))$ where: - η is the learning rate. - λ is the discount factor. - max_a′Q(s_t+1, a′) represents the estimated future reward for the best possible action in the next state.

The policy π(a|s) is derived from the Q-values using an ϵ-greedy strategy, where the agent explores with probability ϵ and exploits the best-known action with probability 1 – ϵ.

To enhance learning stability and prevent catastrophic forgetting, experience replay is employed. A replay buffer stores past experiences (s_t, a_t, r_t, s_t+1), and mini-batches are sampled for training. The Q-learning update is applied to these batches, improving sample efficiency and convergence.

Figure 2 illustrates the reinforcement learning-based routing and channel allocation process.

The reinforcement learning-based approach enables dynamic and adaptive optimization of routing and channel allocation in multi-hop WSNs. By leveraging Q-learning with experience replay, the model continuously improves its decision-making policy, leading to enhanced network throughput, energy efficiency, and reduced congestion. The next section presents experimental evaluations demonstrating the effectiveness of the proposed method.

2.4

Optimization Mechanism

The RL-based optimization updates policy parameters using the Q-learning algorithm: $Q (s_{t}, a_{t}) \leftarrow Q (s_{t}, a_{t}) + η (r_{t} + λ \max_{a'} Q (s_{t + 1}, a') - Q (s_{t}, a_{t}))$ where Q(s_t, a_t) is the action-value function, η is the learning rate, and λ is the discount factor.

The learned policy is periodically fine-tuned using experience replay and policy gradient updates.

The proposed method integrates deep learning with reinforcement learning for optimizing channel allocation and routing in multi-hop WSNs. CNN and LSTM networks extract spatial-temporal features from network states, while an RL agent dynamically learns optimal transmission policies. The approach balances throughput, energy efficiency, and latency through a carefully designed reward function and policy update mechanism. Experimental validation of this method will be presented in the following section.

3

Experiment

This section presents a series of experiments designed to evaluate the effectiveness of the proposed deep reinforcement learning-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). The experiments assess the model’s performance in terms of (i) network throughput and latency, (ii) energy efficiency, and (iii) robustness against interference and dynamic traffic variations. The evaluation compares the proposed method with baseline techniques to demonstrate its advantages in real-world deployment scenarios.

3.1

Experiment Setup

To simulate a realistic multi-hop WSN, a network topology consisting of 100 sensor nodes randomly deployed over a 500m × 500m area is used. Each node has a limited transmission range of 50m, ensuring that multi-hop communication is necessary for data delivery to a central sink node. The simulation runs for 5000 time steps, during which packets are generated following a Poisson distribution. The data packets vary in size from 64 bytes to 512 bytes to simulate heterogeneous network traffic.

The proposed deep reinforcement learning (DNN-RL) model is implemented using a hybrid architecture combining convolutional neural networks (CNNs) and long short-term memory (LSTM) networks for feature extraction. The RL agent is trained using the Q-learning algorithm with an experience replay buffer of 10⁴ state-action pairs. The learning rate is set to η = 0.001, and the discount factor is λ = 0.9.

For comparison, two baseline methods are used: - Q-RL: A traditional reinforcement learning approach based on tabular Q-learning without deep feature extraction. - Heuristic Method (HM): A conventional routing and channel allocation strategy that selects the shortest path with the least congested channel.

Each experiment is repeated 10 times with different random seeds to ensure statistical reliability.

3.2

Throughput and Latency Performance

This experiment measures the network throughput and end-to-end latency under different traffic loads. Throughput is defined as the number of successfully delivered packets per second, while latency represents the average time required for a packet to reach the sink node.

Figure 3 illustrates the network throughput for increasing packet arrival rates. The proposed DNN-RL model consistently achieves higher throughput compared to the baseline methods. At moderate traffic loads (30 packets/sec), DNN-RL achieves a throughput of approximately 6.5 Mbps, which is 12% higher than Q-RL and 30% higher than HM. Under high traffic conditions (80 packets/sec), the performance gap widens as DNN-RL efficiently adapts to congestion by dynamically adjusting channel allocation.

The results of Experiment 1 indicate that the proposed DNN-RL approach significantly improves network throughput and reduces latency compared to traditional methods. As shown in Figure 3, the throughput of the DNN-RL model remains consistently higher across different network loads. When the traffic load reaches 50 packets per second, DNN-RL achieves a throughput of approximately 9.1 Mbps, which is 11% higher than Q-RL (8.2 Mbps) and 25% higher than HM (7.3 Mbps). The performance gap becomes more pronounced under higher loads, where the adaptive channel selection of DNN-RL prevents congestion and maximizes available bandwidth.

Latency measurements further validate the efficiency of the proposed method. Under high traffic conditions, DNN-RL reduces the average packet delay by 20% compared to Q-RL and 35% compared to HM. This improvement is attributed to the model’s ability to dynamically reallocate channels and reroute packets in response to congestion hotspots. In contrast, the heuristic method follows a static shortest-path routing approach, leading to bottlenecks in high-load scenarios.

Overall, these results demonstrate that deep reinforcement learning effectively optimizes multi-hop WSN performance by adapting to real-time network conditions. By learning optimal transmission policies, the proposed approach maintains high throughput and minimizes latency, ensuring reliable data delivery in dynamic environments.

3.3

Energy Efficiency Analysis

Energy efficiency is a critical factor in WSNs, as sensor nodes typically operate on limited battery power. This experiment evaluates the energy consumption per successfully delivered packet and estimates network lifetime under different optimization strategies. The goal is to determine how effectively the proposed DNN-RL method reduces energy usage compared to traditional approaches.

Table 1 presents the average energy consumption per packet and the estimated network lifetime for each method. The results indicate that the proposed DNN-RL model achieves the lowest energy consumption per packet, consuming only 0.12 mJ, which is 20% lower than Q-RL and 30% lower than the heuristic method (HM). The prolonged network lifetime observed in DNN-RL confirms its effectiveness in optimizing power-aware routing and channel allocation.

Table 1.

Comparison of Energy Efficiency

Method	Energy per Packet (mJ)	Network Lifetime (hours)
DNN-RL (Proposed)	0.12	145.3
Q-RL (Baseline)	0.15	132.1
Heuristic Method (HM)	0.21	98.7

To visualize the energy consumption trends, Figure 4 illustrates the total energy depletion over time for different methods. The heuristic method (HM) exhibits the steepest decline, indicating rapid battery depletion. In contrast, DNN-RL maintains a more gradual energy depletion rate, allowing for a significantly longer operational period.

The experimental results demonstrate that the proposed deep reinforcement learning-based optimization significantly reduces energy consumption compared to traditional methods. The heuristic method (HM) leads to faster battery depletion due to its rigid routing decisions, which fail to adapt dynamically to network conditions. The Q-RL model achieves moderate improvements, but it lacks the deep feature extraction capabilities of DNN-RL, which enable more efficient power-aware routing and adaptive channel selection.

DNN-RL prolongs network lifetime by approximately 47% compared to HM and 10% compared to Q-RL. The gradual depletion rate in Figure 4 confirms that the reinforcement learning-based model effectively balances energy-efficient transmissions, optimizing power consumption across the network.

These findings highlight the importance of deep learning in enhancing the sustainability of WSN deployments. By leveraging real-time decision-making, the proposed model ensures that energy resources are utilized optimally, preventing premature node failures and extending network longevity.

3.4

Robustness to Interference and Traffic Variability

This experiment tests the model’s robustness against dynamically changing interference and varying traffic conditions. External interference sources are introduced randomly, and the packet delivery ratio (PDR) is monitored.

Experiment 3 evaluates the resilience of different optimization methods against dynamic interference and fluctuating traffic loads. Figure 5 illustrates that the proposed DNN-RL approach maintains a significantly higher packet delivery ratio (PDR) under increasing interference levels compared to Q-RL and HM. At moderate interference levels, DNN-RL achieves a PDR of approximately 94%, whereas Q-RL and HM degrade to 90% and 85%, respectively. As interference intensifies, the performance gap widens, with HM dropping below 65% PDR at the highest interference levels, while DNN-RL still maintains over 80% delivery reliability.

These findings suggest that deep reinforcement learning enhances the robustness of WSNs by dynamically adapting routing and channel selection policies. The traditional heuristic approach suffers significant performance degradation in highly dynamic environments because it lacks adaptability, leading to excessive packet loss and inefficient channel utilization. In contrast, the DNN-RL model continuously learns from changing network conditions, allowing it to mitigate the impact of interference and maintain stable network performance.

The results confirm that the proposed method is well-suited for real-world WSN deployments where environmental conditions and traffic loads are unpredictable. By leveraging real-time decision-making, DNN-RL ensures reliable communication even in challenging network conditions, making it a promising solution for mission-critical applications such as industrial monitoring and emergency response systems.

4

Discussion

The experimental results demonstrate that the proposed deep reinforcement learning (DNN-RL) approach effectively enhances multi-hop wireless sensor network (WSN) performance across various key metrics, including throughput, latency, energy efficiency, and robustness against interference. The comparison with traditional heuristic and reinforcement learning-based approaches highlights the advantages of integrating deep learning for feature extraction and adaptive decision-making in dynamic network environments.

The first experiment showed that DNN-RL achieves significantly higher throughput and lower latency compared to Q-learning-based and heuristic methods. This improvement stems from the model’s ability to dynamically allocate channels and optimize routing paths based on real-time network conditions. Unlike the heuristic method, which follows a static routing strategy, the reinforcement learning agent continuously updates its policy based on observed network performance, leading to more efficient packet transmission. Additionally, the use of CNN-LSTM feature extraction enables the agent to capture both spatial and temporal variations in network traffic, allowing for better congestion management. The lower latency observed in DNN-RL confirms that it successfully mitigates transmission delays by dynamically rerouting packets away from congested areas.

In terms of energy efficiency, Experiment 2 revealed that DNN-RL significantly reduces energy consumption per packet compared to the baseline methods. The reinforcement learning framework optimizes power-aware routing decisions by balancing energy distribution across sensor nodes, preventing early depletion of critical network resources. The results indicate that the proposed approach extends network lifetime by approximately 47% compared to heuristic methods and 10% compared to Q-learning. This efficiency is primarily attributed to the model’s ability to learn energy-efficient transmission policies, reducing unnecessary retransmissions and idle listening, which are major contributors to energy wastage in WSNs. The visualization of energy depletion over time further confirms that the proposed model distributes power consumption more evenly across the network, reducing the likelihood of network partitioning due to node failures.

The third experiment evaluated the robustness of DNN-RL in the presence of dynamic interference and fluctuating traffic conditions. The results show that the proposed model maintains a higher packet delivery ratio (PDR) than both Q-RL and heuristic methods, particularly under high interference levels. This resilience is due to the model’s ability to adaptively reassign channels and reroute packets to minimize packet collisions and signal degradation. The heuristic method, which does not incorporate real-time interference feedback, suffers from significant performance degradation as interference increases, leading to a PDR drop below 65%. In contrast, the reinforcement learning model successfully learns optimal channel selection policies that maximize delivery reliability, ensuring stable communication even in harsh network environments.

The advantages of the proposed method stem from its integration of deep learning and reinforcement learning. Unlike traditional Q-learning approaches, which struggle with large state spaces and slow convergence, DNN-RL leverages CNN-LSTM architectures to extract meaningful features from raw network state data, reducing the complexity of the decision-making process. This results in faster policy convergence and more effective adaptation to changing network conditions. Furthermore, the experience replay mechanism ensures that past experiences are utilized efficiently, improving the model’s stability and generalization across different network scenarios.

Despite these advantages, the proposed method has several limitations. First, the computational complexity of training deep reinforcement learning models is significantly higher than that of traditional heuristic methods. The model requires substantial training data and computational resources to learn optimal policies, which may be a limiting factor in real-time, resource-constrained WSN applications. Second, while the CNN-LSTM architecture enhances feature extraction, it may not fully capture abrupt environmental changes, such as sudden node failures or extreme interference spikes. Future work could explore hybrid approaches that combine reinforcement learning with real-time anomaly detection mechanisms to further improve adaptability. Additionally, fine-tuning the reward function remains a challenge, as different application scenarios may require different trade-offs between throughput, energy efficiency, and latency.

Overall, the results confirm that deep reinforcement learning is a promising approach for optimizing multi-hop WSNs, offering significant improvements in network performance while maintaining energy efficiency and robustness. Future research directions could focus on reducing the computational cost of training by implementing distributed learning techniques, as well as extending the model to support real-time network reconfiguration in highly dynamic environments.

5

Conclusion

This paper proposed a deep reinforcement learning-based approach for optimizing channel allocation and routing in multi-hop wireless sensor networks (WSNs). By integrating convolutional and recurrent neural networks for feature extraction with reinforcement learning-based policy optimization, the proposed framework effectively adapts to dynamic network conditions, improving throughput, latency, energy efficiency, and robustness against interference. Experimental results demonstrated that the DNN-RL model achieves significantly higher throughput and lower latency compared to traditional heuristic and Q-learning-based methods, particularly under high traffic loads. The energy efficiency analysis confirmed that the model reduces power consumption per packet by up to 30% compared to conventional approaches, leading to an extended network lifetime. Additionally, the proposed method demonstrated strong resilience to interference, maintaining a higher packet delivery ratio (PDR) even under severe network disruptions. These advantages stem from the model’s ability to dynamically adjust transmission policies based on real-time network observations, enabling efficient and adaptive decision-making. Despite its strengths, the computational complexity of training deep reinforcement learning models remains a challenge, and further research is needed to enhance real-time adaptability and reduce resource requirements. Future work could explore distributed training methods, hybrid optimization approaches, and real-time anomaly detection to further enhance model performance. Overall, this study highlights the potential of deep learning-based optimization techniques for enhancing WSN performance, paving the way for more intelligent and energy-efficient network management solutions.

Idioma:: Inglés

Calendario de la edición:: 1 veces al año
Temas de la revista:: Ciencias de la vida, Ciencias de la vida, otros, Matemáticas, Matemáticas aplicadas, Matemáticas generales, Física, Física, otros

RSS Feed de revista

Application of Deep Neural Networks in Multi-Hop Wireless Sensor Network (WSN) Channel Optimization

Yiyang Chen

Publicado en línea: 11 abr 2025

Recibido: 04 dic 2024

Aceptado: 08 mar 2025

DOI: https://doi.org/10.2478/amns-2025-0848

Palabras claveDeep Neural Networks, Wireless Sensor Networks, Multi-Hop Communication, Channel Optimization, Reinforcement Learning, Network Performance

© 2025 Yiyang Chen, published by Sciendo

This work is licensed under the Creative Commons Attribution 4.0 International License.

Palabras clave
Deep Neural Networks, Wireless Sensor Networks, Multi-Hop Communication, Channel Optimization, Reinforcement Learning, Network Performance