Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.

Literature DB >> 30273155

Off-Policy Interleaved Q -Learning: Optimal Control for Affine Nonlinear Discrete-Time Systems.

Jinna Li, Tianyou Chai, Frank L Lewis, Zhengtao Ding, Yi Jiang.

Abstract

In this paper, a novel off-policy interleaved Q-learning algorithm is presented for solving optimal control problem of affine nonlinear discrete-time (DT) systems, using only the measured data along the system trajectories. Affine nonlinear feature of systems, unknown dynamics, and off-policy learning approach pose tremendous challenges on approximating optimal controllers. To this end, on-policy Q-learning method for optimal control of affine nonlinear DT systems is reviewed first, and its convergence is rigorously proven. The bias of solution to Q-function-based Bellman equation caused by adding probing noises to systems for satisfying persistent excitation is also analyzed when using on-policy Q-learning approach. Then, a behavior control policy is introduced followed by proposing an off-policy Q-learning algorithm. Meanwhile, the convergence of algorithm and no bias of solution to optimal control problem when adding probing noise to systems are investigated. Third, three neural networks run by the interleaved Q-learning approach in the actor-critic framework. Thus, a novel off-policy interleaved Q-learning algorithm is derived, and its convergence is proven. Simulation results are given to verify the effectiveness of the proposed method.

Entities: Chemical

Year: 2018 PMID： 30273155 DOI： 10.1109/TNNLS.2018.2861945

Source DB: PubMed Journal: IEEE Trans Neural Netw Learn Syst ISSN： 2162-237X Impact factor: 10.451

Keyword Cloud
Cited

2 in total

1. Randomised controlled trial of WISENSE, a real-time quality improving system for monitoring blind spots during esophagogastroduodenoscopy.

Authors: Lianlian Wu; Jun Zhang; Wei Zhou; Ping An; Lei Shen; Jun Liu; Xiaoda Jiang; Xu Huang; Ganggang Mu; Xinyue Wan; Xiaoguang Lv; Juan Gao; Ning Cui; Shan Hu; Yiyun Chen; Xiao Hu; Jiangjie Li; Di Chen; Dexin Gong; Xinqi He; Qianshan Ding; Xiaoyun Zhu; Suqin Li; Xiao Wei; Xia Li; Xuemei Wang; Jie Zhou; Mengjiao Zhang; Hong Gang Yu
Journal: Gut Date: 2019-03-11 Impact factor: 23.059

2. Multi-Agent Reinforcement Learning via Adaptive Kalman Temporal Difference and Successor Representation.

Authors: Mohammad Salimibeni; Arash Mohammadi; Parvin Malekzadeh; Konstantinos N Plataniotis
Journal: Sensors (Basel) Date: 2022-02-11 Impact factor: 3.576

2 in total