Literature DB >> 32092032

Policy Iteration Q-Learning for Data-Based Two-Player Zero-Sum Game of Linear Discrete-Time Systems.

Biao Luo, Yin Yang, Derong Liu.   

Abstract

In this article, the data-based two-player zero-sum game problem is considered for linear discrete-time systems. This problem theoretically depends on solving the discrete-time game algebraic Riccati equation (DTGARE), while it requires complete system dynamics. To avoid solving the DTGARE, the Q -function is introduced and a data-based policy iteration Q -learning (PIQL) algorithm is developed to learn the optimal Q -function by using data collected from the real system. Writing the Q -function in a quadratic form, it is proved that the PIQL algorithm is equivalent to the Newton iteration method in the Banach space by using the Fréchet derivative. Then, the convergence of the PIQL algorithm can be guaranteed by Kantorovich's theorem. For the realization of the PIQL algorithm, the off-policy learning scheme is proposed using real data rather than the system model. Finally, the efficiency of the developed data-based PIQL method is validated through simulation studies.

Year:  2021        PMID: 32092032     DOI: 10.1109/TCYB.2020.2970969

Source DB:  PubMed          Journal:  IEEE Trans Cybern        ISSN: 2168-2267            Impact factor:   11.448


  1 in total

1.  Trajectory Tracking within a Hierarchical Primitive-Based Learning Approach.

Authors:  Mircea-Bogdan Radac
Journal:  Entropy (Basel)       Date:  2022-06-28       Impact factor: 2.738

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.