| Literature DB >> 35885188 |
Chengpeng Jiang1,2, Ziyang Wang3, Shuai Chen1,2, Jinglin Li1,2, Haoran Wang1,2, Jinwei Xiang1,2, Wendong Xiao1,2.
Abstract
The breakthrough of wireless energy transmission (WET) technology has greatly promoted the wireless rechargeable sensor networks (WRSNs). A promising method to overcome the energy constraint problem in WRSNs is mobile charging by employing a mobile charger to charge sensors via WET. Recently, more and more studies have been conducted for mobile charging scheduling under dynamic charging environments, ignoring the consideration of the joint charging sequence scheduling and charging ratio control (JSSRC) optimal design. This paper will propose a novel attention-shared multi-agent actor-critic-based deep reinforcement learning approach for JSSRC (AMADRL-JSSRC). In AMADRL-JSSRC, we employ two heterogeneous agents named charging sequence scheduler and charging ratio controller with an independent actor network and critic network. Meanwhile, we design the reward function for them, respectively, by considering the tour length and the number of dead sensors. The AMADRL-JSSRC trains decentralized policies in multi-agent environments, using a centralized computing critic network to share an attention mechanism, and it selects relevant policy information for each agent at every charging decision. Simulation results demonstrate that the proposed AMADRL-JSSRC can efficiently prolong the lifetime of the network and reduce the number of death sensors compared with the baseline algorithms.Entities:
Keywords: attention-shared; deep reinforcement learning; mobile charging; multi-agent; wireless rechargeable sensor network
Year: 2022 PMID: 35885188 PMCID: PMC9317597 DOI: 10.3390/e24070965
Source DB: PubMed Journal: Entropy (Basel) ISSN: 1099-4300 Impact factor: 2.738
Performance Comparison of the Existing Approaches and the Proposed Approach.
| Approach | Dynamic Change of the Sensor Energy Consumption | Charging Sequence Scheduling | Charging Ratio Control | Charging Sequence Scheduling and Charging Ratio Control Simultaneously | |
|---|---|---|---|---|---|
| Off-line | [ | No | Yes | No | No |
| [ | No | Yes | No | No | |
| [ | No | Yes | No | No | |
| [ | No | Yes | No | No | |
| [ | No | Yes | No | No | |
| On-line | [ | Yes | Yes | No | No |
| [ | Yes | Yes | No | No | |
| [ | Yes | Yes | No | No | |
| [ | Yes | Yes | Yes | No | |
| RL | [ | No | Yes | No | No |
| [ | No | Yes | No | No | |
| [ | No | Yes | No | No | |
| [ | Yes | Yes | No | No | |
| Ours |
|
|
|
|
Figure 1An example WRSN with a mobile charger.
Abbreviations used in this paper.
| Abbreviation | Description |
|---|---|
| WRSN | Wireless rechargeable sensor network |
| MC | Mobile charger |
| BS | Base station |
| Dis | Total moving distance of MC during the charging tour |
| JSSRC | Joint mobile charging sequence scheduling and charging ratio control problem |
| AMADRL | Attention-shared multi-agent actor–critic-based deep reinforcement learning |
|
| State information of MC |
|
| State information of network |
| ACRL | Actor–critic reinforcement learning |
| DP | Dynamic programming |
| NJNP | Nearest-job-next with preemption |
| TSCA | Temporal–spatial real-time charging scheduling algorithm |
Figure 2A scheduling example of JSSRC.
State Space Update.
| Time Step | Observation | Agent 1 | Agent 2 | Immediate | ||
|---|---|---|---|---|---|---|
|
| Individual |
|
| |||
| 1 |
|
|
|
|
|
|
| … | … | … | … | … | … | … |
|
|
|
|
|
|
|
|
| … | … | … | … | … | … | … |
|
|
|
|
|
|
|
|
Figure 3The structure of the AMADRL-JSSRC algorithm.
The Parameters of the Simulation Settings.
| Parameter | Description | Value |
|---|---|---|
| Network size |
| |
| Number of sensors | 50–200 | |
|
| MC initial energy | 100 J |
|
| Moving speed of MC | 0.1 m/s |
|
| The speed of energy consumed on moving of MC | 0.1 J/m |
|
| Charging speed of MC | 1 J/s |
|
| Charging ratio | |
|
| Energy capacity of sensor | 50 J |
|
| The threshold of the number of dead sensors | 0.5 |
|
| The set of initial residual energy | 10~20 J |
Energy parameters of sensor.
| Parameter | Description | Value |
|---|---|---|
|
| Distance-free energy consumption index |
|
|
| Distance-related energy consumption index |
|
|
| Energy consumption for receiving or transmitting |
|
| Number of bits |
| |
|
| Signal attenuation coefficient | 4 |
| Per second packet generation probability | 0.2~0.5 |
Key Parameters of the Training Stage.
| Parameter | Description | Value |
|---|---|---|
|
| Size of experience replay buffer |
|
|
| Size of mini-batch | 1024 |
|
| Actor learning rate | 5 × 10−4 (JSSRC50,100) |
|
| Critic learning rate | 5 × 10−4 (JSSRC50,100) |
|
| Number of parallel environments | 4 |
|
| Number of episodes |
|
|
| Number of steps per episode | 100 |
|
| Number of critic updates | 4 |
|
| Number of policy updates | 4 |
|
| Number of multiple attention heads | 4 |
|
| Number of target updates |
|
| Adam | Optimizer method | |
|
| Reward discount | 0.9 |
|
| Reward coefficient | 0.5 |
|
| Penalty coefficient. | 10 |
|
| Update rate of target parameters | 0.005 |
|
| Temperature parameter | 0.01 |
Figure 4Reward per episode.
Impact of the reward discount ().
|
| 0.5 | 0.6 | 0.7 | 0.8 | 0.9 |
|---|---|---|---|---|---|
| Reward | −60.72 | −18.05 | 0.19 | 20.89 | 52.45 |
| Number of dead sensors | 25 | 18 | 13 | 8 | 5 |
| Moving distance (m) | 11.37 | 13.15 | 14.85 | 15.55 | 16.77 |
Impact of the penalty coefficient ().
|
| 0 | 1 | 5 | 8 | 10 |
|---|---|---|---|---|---|
| Reward | 60.33 | 38.29 | 37.64 | 40.12 | 52.45 |
| Number of dead sensors | 20 | 15 | 11 | 7 | 5 |
| Moving distance (m) | 14.54 | 15.09 | 15.88 | 16.23 | 16.77 |
The Results Based on Different Algorithms Over Test Set.
| Environment | Algorithm | Mean Length | Std |
| Base Time | Extra Time |
|---|---|---|---|---|---|---|
| JSSRC50 | AMADRL-JSSRC | 13.918 | 0.802 | 3 | 100 | 0.905 |
| ACRL | 13.878 | 0.798 | 4 | 100 | 0.788 | |
| GREEDY | 13.902 | 0.834 | 2 | 100 | 0.647 | |
| DP | 14.068 | 0.856 | 6 | 100 | 0.743 | |
| NJNP | 13.834 | 0.815 | 5 | 100 | 0.516 | |
| TSCA | 14.028 | 0.755 | 4 | 100 | 0.498 | |
| JSSRC100 | AMADRL-JSSRC | 17.454 | 1.228 | 5 | 200 | 1.463 |
| ACRL | 16.768 | 1.266 | 8 | 200 | 1.32 | |
| GREEDY | 18.233 | 1.445 | 13 | 200 | 1.38 | |
| DP | 18.088 | 1.328 | 13 | 200 | 1.12 | |
| NJNP | 16.891 | 1.306 | 12 | 200 | 0.995 | |
| TSCA | 17.718 | 1.205 | 11 | 200 | 0.936 | |
| JSSRC200 | AMADRL-JSSRC | 36.769 | 1.813 | 8 | 300 | 1.828 |
| ACRL | 36.126 | 1.998 | 12 | 300 | 1.482 | |
| GREEDY | 37.856 | 3.162 | 19 | 300 | 1.635 | |
| DP | 37.532 | 2.376 | 18 | 300 | 1.864 | |
| NJNP | 35.513 | 2.265 | 17 | 300 | 1.465 | |
| TSCA | 35.921 | 2.169 | 16 | 300 | 1.416 |
Figure 5The impact of the capacity of the sensor on the average tour length.
Figure 6The impact of the capacity of the sensor on the average number of dead sensors.
Figure 7The impact of capacity of MC on the average tour length.
Figure 8The impact of capacity of MC on the average number of dead sensors.
Figure 9The lifetime of different algorithms on JSSRC50.
Figure 10The lifetime of different algorithms on JSSRC100.
Figure 11The lifetime of different algorithms on JSSRC200.