Literature DB >> 17526332

Continuous-time adaptive critics.

Thomas Hanselmann1, Lyle Noakes, Anthony Zaknich.   

Abstract

A continuous-time formulation of an adaptive critic design (ACD) is investigated. Connections to the discrete case are made, where backpropagation through time (BPTT) and real-time recurrent learning (RTRL) are prevalent. Practical benefits are that this framework fits in well with plant descriptions given by differential equations and that any standard integration routine with adaptive step-size does an adaptive sampling for free. A second-order actor adaptation using Newton's method is established for fast actor convergence for a general plant and critic. Also, a fast critic update for concurrent actor-critic training is introduced to immediately apply necessary adjustments of critic parameters induced by actor updates to keep the Bellman optimality correct to first-order approximation after actor changes. Thus, critic and actor updates may be performed at the same time until some substantial error build up in the Bellman optimality or temporal difference equation, when a traditional critic training needs to be performed and then another interval of concurrent actor-critic training may resume.

Entities:  

Mesh:

Year:  2007        PMID: 17526332     DOI: 10.1109/TNN.2006.889499

Source DB:  PubMed          Journal:  IEEE Trans Neural Netw        ISSN: 1045-9227


  1 in total

1.  Efficient Actor-Critic Algorithm with Hierarchical Model Learning and Planning.

Authors:  Shan Zhong; Quan Liu; QiMing Fu
Journal:  Comput Intell Neurosci       Date:  2016-10-03
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.