Literature DB >> 17668655

Kernel-based least squares policy iteration for reinforcement learning.

Xin Xu1, Dewen Hu, Xicheng Lu.   

Abstract

In this paper, we present a kernel-based least squares policy iteration (KLSPI) algorithm for reinforcement learning (RL) in large or continuous state spaces, which can be used to realize adaptive feedback control of uncertain dynamic systems. By using KLSPI, near-optimal control policies can be obtained without much a priori knowledge on dynamic models of control plants. In KLSPI, Mercer kernels are used in the policy evaluation of a policy iteration process, where a new kernel-based least squares temporal-difference algorithm called KLSTD-Q is proposed for efficient policy evaluation. To keep the sparsity and improve the generalization ability of KLSTD-Q solutions, a kernel sparsification procedure based on approximate linear dependency (ALD) is performed. Compared to the previous works on approximate RL methods, KLSPI makes two progresses to eliminate the main difficulties of existing results. One is the better convergence and (near) optimality guarantee by using the KLSTD-Q algorithm for policy evaluation with high precision. The other is the automatic feature selection using the ALD-based kernel sparsification. Therefore, the KLSPI algorithm provides a general RL method with generalization performance and convergence guarantee for large-scale Markov decision problems (MDPs). Experimental results on a typical RL task for a stochastic chain problem demonstrate that KLSPI can consistently achieve better learning efficiency and policy quality than the previous least squares policy iteration (LSPI) algorithm. Furthermore, the KLSPI method was also evaluated on two nonlinear feedback control problems, including a ship heading control problem and the swing up control of a double-link underactuated pendulum called acrobot. Simulation results illustrate that the proposed method can optimize controller performance using little a priori information of uncertain dynamic systems. It is also demonstrated that KLSPI can be applied to online learning control by incorporating an initial controller to ensure online performance.

Entities:  

Mesh:

Year:  2007        PMID: 17668655     DOI: 10.1109/TNN.2007.899161

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw        ISSN: 1045-9227


  4 in total

1.  Intelligent control of a sensor-actuator system via kernelized least-squares policy iteration.

Authors:  Bo Liu; Sanfeng Chen; Shuai Li; Yongsheng Liang
Journal:  Sensors (Basel)       Date:  2012-02-28       Impact factor: 3.576

2.  Kernel Recursive Least-Squares Temporal Difference Algorithms with Sparsification and Regularization.

Authors:  Chunyuan Zhang; Qingxin Zhu; Xinzheng Niu
Journal:  Comput Intell Neurosci       Date:  2016-06-29

3.  Research on UCAV Maneuvering Decision Method Based on Heuristic Reinforcement Learning.

Authors:  Wang Yuan; Zhang Xiwen; Zhou Rong; Tang Shangqin; Zhou Huan; Ding Wei
Journal:  Comput Intell Neurosci       Date:  2022-03-03

4.  Stochasticity, Nonlinear Value Functions, and Update Rules in Learning Aesthetic Biases.

Authors:  Norberto M Grzywacz
Journal:  Front Hum Neurosci       Date:  2021-05-10       Impact factor: 3.169

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.