Literature DB >> 32817541

Fast reinforcement learning with generalized policy updates.

André Barreto1, Shaobo Hou2, Diana Borsa2, David Silver2, Doina Precup2,3.   

Abstract

The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

Keywords:  artificial intelligence; generalized policy evaluation; generalized policy improvement; reinforcement learning; successor features

Year:  2020        PMID: 32817541      PMCID: PMC7720214          DOI: 10.1073/pnas.1907370117

Source DB:  PubMed          Journal:  Proc Natl Acad Sci U S A        ISSN: 0027-8424            Impact factor:   11.205


  8 in total

Review 1.  Optimality principles in sensorimotor control.

Authors:  Emanuel Todorov
Journal:  Nat Neurosci       Date:  2004-09       Impact factor: 24.884

2.  Mastering the game of Go with deep neural networks and tree search.

Authors:  David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal:  Nature       Date:  2016-01-28       Impact factor: 49.962

3.  Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

Authors:  Patrick M Pilarski; Michael R Dawson; Thomas Degris; Farbod Fahimi; Jason P Carey; Richard S Sutton
Journal:  IEEE Int Conf Rehabil Robot       Date:  2011

Review 4.  A neural substrate of prediction and reward.

Authors:  W Schultz; P Dayan; P R Montague
Journal:  Science       Date:  1997-03-14       Impact factor: 47.728

5.  Computer science. Heads-up limit hold'em poker is solved.

Authors:  Michael Bowling; Neil Burch; Michael Johanson; Oskari Tammelin
Journal:  Science       Date:  2015-01-09       Impact factor: 47.728

6.  A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Authors:  David Silver; Thomas Hubert; Julian Schrittwieser; Ioannis Antonoglou; Matthew Lai; Arthur Guez; Marc Lanctot; Laurent Sifre; Dharshan Kumaran; Thore Graepel; Timothy Lillicrap; Karen Simonyan; Demis Hassabis
Journal:  Science       Date:  2018-12-07       Impact factor: 47.728

7.  Mastering the game of Go without human knowledge.

Authors:  David Silver; Julian Schrittwieser; Karen Simonyan; Ioannis Antonoglou; Aja Huang; Arthur Guez; Thomas Hubert; Lucas Baker; Matthew Lai; Adrian Bolton; Yutian Chen; Timothy Lillicrap; Fan Hui; Laurent Sifre; George van den Driessche; Thore Graepel; Demis Hassabis
Journal:  Nature       Date:  2017-10-18       Impact factor: 49.962

8.  Human-level control through deep reinforcement learning.

Authors:  Volodymyr Mnih; Koray Kavukcuoglu; David Silver; Andrei A Rusu; Joel Veness; Marc G Bellemare; Alex Graves; Martin Riedmiller; Andreas K Fidjeland; Georg Ostrovski; Stig Petersen; Charles Beattie; Amir Sadik; Ioannis Antonoglou; Helen King; Dharshan Kumaran; Daan Wierstra; Shane Legg; Demis Hassabis
Journal:  Nature       Date:  2015-02-26       Impact factor: 49.962

  8 in total
  2 in total

1.  The science of deep learning.

Authors:  Richard Baraniuk; David Donoho; Matan Gavish
Journal:  Proc Natl Acad Sci U S A       Date:  2020-11-23       Impact factor: 11.205

2.  A general model of hippocampal and dorsal striatal learning and decision making.

Authors:  Jesse P Geerts; Fabian Chersi; Kimberly L Stachenfeld; Neil Burgess
Journal:  Proc Natl Acad Sci U S A       Date:  2020-11-23       Impact factor: 11.205

  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.