Literature DB >> 10636940

Reinforcement learning in continuous time and space.

K Doya1.   

Abstract

This article presents a reinforcement learning framework for continuous-time dynamical systems without a priori discretization of time, state, and action. Based on the Hamilton-Jacobi-Bellman (HJB) equation for infinite-horizon, discounted reward problems, we derive algorithms for estimating value functions and improving policies with the use of function approximators. The process of value function estimation is formulated as the minimization of a continuous-time form of the temporal difference (TD) error. Update methods based on backward Euler approximation and exponential eligibility traces are derived, and their correspondences with the conventional residual gradient, TD(0), and TD(lambda) algorithms are shown. For policy improvement, two methods-a continuous actor-critic method and a value-gradient-based greedy policy-are formulated. As a special case of the latter, a nonlinear feedback control law using the value gradient and the model of the input gain is derived. The advantage updating, a model-free algorithm derived previously, is also formulated in the HJB-based framework. The performance of the proposed algorithms is first tested in a nonlinear control task of swinging a pendulum up with limited torque. It is shown in the simulations that (1) the task is accomplished by the continuous actor-critic method in a number of trials several times fewer than by the conventional discrete actor-critic method; (2) among the continuous policy update methods, the value-gradient-based policy with a known or learned dynamic model performs several times better than the actor-critic method; and (3) a value function update using exponential eligibility traces is more efficient and stable than that based on Euler approximation. The algorithms are then tested in a higher-dimensional task: cart-pole swing-up. This task is accomplished in several hundred trials using the value-gradient-based policy with a learned dynamic model.

Mesh:

Year:  2000        PMID: 10636940     DOI: 10.1162/089976600300015961

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  45 in total

Review 1.  Conditional visuo-motor learning and dimension reduction.

Authors:  Fadila Hadj-Bouziane; Hélène Frankowska; Martine Meunier; Pierre-Arnaud Coquelin; Driss Boussaoud
Journal:  Cogn Process       Date:  2006-01-28

2.  A neural-network reinforcement-learning model of domestic chicks that learn to localize the centre of closed arenas.

Authors:  Francesco Mannella; Gianluca Baldassarre
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2007-03-29       Impact factor: 6.237

3.  Rapid decision threshold modulation by reward rate in a neural network.

Authors:  Patrick Simen; Jonathan D Cohen; Philip Holmes
Journal:  Neural Netw       Date:  2006-09-20

4.  Spatial generalization in operant learning: lessons from professional basketball.

Authors:  Tal Neiman; Yonatan Loewenstein
Journal:  PLoS Comput Biol       Date:  2014-05-22       Impact factor: 4.475

Review 5.  Creating the brain and interacting with the brain: an integrated approach to understanding the brain.

Authors:  Jun Morimoto; Mitsuo Kawato
Journal:  J R Soc Interface       Date:  2015-03-06       Impact factor: 4.118

6.  A computational model for optimal muscle activity considering muscle viscoelasticity in wrist movements.

Authors:  Hiroyuki Kambara; Duk Shin; Yasuharu Koike
Journal:  J Neurophysiol       Date:  2013-01-16       Impact factor: 2.714

7.  Addiction beyond pharmacological effects: The role of environment complexity and bounded rationality.

Authors:  Dimitri Ognibene; Vincenzo G Fiore; Xiaosi Gu
Journal:  Neural Netw       Date:  2019-05-08

8.  Application of the Actor-Critic Architecture to Functional Electrical Stimulation Control of a Human Arm.

Authors:  Philip Thomas; Michael Branicky; Antonie van den Bogert; Kathleen Jagodnik
Journal:  Proc Innov Appl Artif Intell Conf       Date:  2009

9.  Vocal exploration is locally regulated during song learning.

Authors:  Primoz Ravbar; Dina Lipkind; Lucas C Parra; Ofer Tchernichovski
Journal:  J Neurosci       Date:  2012-03-07       Impact factor: 6.167

10.  Reinforcement learning using a continuous time actor-critic framework with spiking neurons.

Authors:  Nicolas Frémaux; Henning Sprekeler; Wulfram Gerstner
Journal:  PLoS Comput Biol       Date:  2013-04-11       Impact factor: 4.475

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.