Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles.

Literature DB >> 30602426

Multi Pseudo Q-Learning-Based Deterministic Policy Gradient for Tracking Control of Autonomous Underwater Vehicles.

Wenjie Shi, Shiji Song, Cheng Wu, C L Philip Chen.

Abstract

This paper investigates trajectory tracking problem for a class of underactuated autonomous underwater vehicles (AUVs) with unknown dynamics and constrained inputs. Different from existing policy gradient methods which employ single actor critic but cannot realize satisfactory tracking control accuracy and stable learning, our proposed algorithm can achieve high-level tracking control accuracy of AUVs and stable learning by applying a hybrid actors-critics architecture, where multiple actors and critics are trained to learn a deterministic policy and action-value function, respectively. Specifically, for the critics, the expected absolute Bellman error-based updating rule is used to choose the worst critic to be updated in each time step. Subsequently, to calculate the loss function with more accurate target value for the chosen critic, Pseudo Q-learning, which uses subgreedy policy to replace the greedy policy in Q-learning, is developed for continuous action spaces, and Multi Pseudo Q-learning (MPQ) is proposed to reduce the overestimation of action-value function and to stabilize the learning. As for the actors, deterministic policy gradient is applied to update the weights, and the final learned policy is defined as the average of all actors to avoid large but bad updates. Moreover, the stability analysis of the learning is given qualitatively. The effectiveness and generality of the proposed MPQ-based deterministic policy gradient (MPQ-DPG) algorithm are verified by the application on AUV with two different reference trajectories. In addition, the results demonstrate high-level tracking control accuracy and stable learning of MPQ-DPG. Besides, the results also validate that increasing the number of the actors and critics will further improve the performance.

Year: 2018 PMID： 30602426 DOI： 10.1109/TNNLS.2018.2884797

Source DB: PubMed Journal: IEEE Trans Neural Netw Learn Syst ISSN： 2162-237X Impact factor: 10.451

Keyword Cloud
Cited

2 in total

1. Tracking Design of an Uncertain Autonomous Underwater Vehicle with Input Saturations by Adaptive Regression Matrix-Based Fixed-Time Control.

Authors: Hsiu-Ming Wu
Journal: Sensors (Basel) Date: 2022-04-28 Impact factor: 3.576

2. Research on Multi-AUVs Data Acquisition System of Underwater Acoustic Communication Network.

Authors: Chunxian Gao; Wenwen Hu; Keyu Chen
Journal: Sensors (Basel) Date: 2022-07-06 Impact factor: 3.847

2 in total