| Literature DB >> 30424017 |
Chaofeng Wang1, Li Wei2, Zhaohui Wang3, Min Song4, Nina Mahmoudian5.
Abstract
This work studies online learning-based trajectory planning for multiple autonomous underwater vehicles (AUVs) to estimate a water parameter field of interest in the under-ice environment. A centralized system is considered, where several fixed access points on the ice layer are introduced as gateways for communications between the AUVs and a remote data fusion center. We model the water parameter field of interest as a Gaussian process with unknown hyper-parameters. The AUV trajectories for sampling are determined on an epoch-by-epoch basis. At the end of each epoch, the access points relay the observed field samples from all the AUVs to the fusion center, which computes the posterior distribution of the field based on the Gaussian process regression and estimates the field hyper-parameters. The optimal trajectories of all the AUVs in the next epoch are determined to maximize a long-term reward that is defined based on the field uncertainty reduction and the AUV mobility cost, subject to the kinematics constraint, the communication constraint and the sensing area constraint. We formulate the adaptive trajectory planning problem as a Markov decision process (MDP). A reinforcement learning-based online learning algorithm is designed to determine the optimal AUV trajectories in a constrained continuous space. Simulation results show that the proposed learning-based trajectory planning algorithm has performance similar to a benchmark method that assumes perfect knowledge of the field hyper-parameters.Entities:
Keywords: AUVs; adaptive trajectory planning; field estimation; reinforcement learning; under-ice exploration; underwater communication networks
Year: 2018 PMID: 30424017 PMCID: PMC6263807 DOI: 10.3390/s18113859
Source DB: PubMed Journal: Sensors (Basel) ISSN: 1424-8220 Impact factor: 3.576
Figure 1An illustration of the system layout with three autonomous underwater vehicles (AUVs) and four access points.
Figure 2System operation within an epoch. The AUV samples at the end of each time slot. There are K time slots within an epoch for AUV navigation and sampling.
Figure 3Neural network design in deep deterministic policy gradient (DDPG). (a) The forward structure of the actor network; (b) the forward structure of the critic network.
Figure 4The true field and the estimated fields obtained by the three schemes. (a) True field; (b) Estimated field by Scheme 1; (c) Estimated field by Scheme 2; (d) Estimated field by Scheme 3.
Figure 5Trajectories of four AUVs obtained by three schemes. The squares indicate the positions of four access points and the initial deployment locations of four AUVs. The circles indicate the acoustic communication coverage of four access points. (a) Scheme 1; (b) Scheme 2; (c) Scheme 3.
Performance comparison of the three schemes.
| Scheme 1 | Scheme 2 | Scheme 3 | |
|---|---|---|---|
| Total traveled distance (km) | 74.4 | 77.9 | 78.1 |
| Total traveled angle (rad) | 76.6 | 117.4 | 131.5 |
| Normalized mean square error (NMSE) | 0.17 | 0.26 | 1.35 |
Field estimation performance of Scheme 2 with different values of the epoch duration.
| Epoch Duration (minutes) | 30 | 40 | 50 |
|---|---|---|---|
| NMSE | 0.22 | 0.23 | 0.26 |