Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Fast reinforcement learning with generalized policy updates.

Literature DB >> 32817541

Fast reinforcement learning with generalized policy updates.

André Barreto¹, Shaobo Hou², Diana Borsa², David Silver², Doina Precup^2,3.

Abstract

The combination of reinforcement learning with deep learning is a promising approach to tackle important sequential decision-making problems that are currently intractable. One obstacle to overcome is the amount of data needed by learning systems of this type. In this article, we propose to address this issue through a divide-and-conquer approach. We argue that complex decision problems can be naturally decomposed into multiple tasks that unfold in sequence or in parallel. By associating each task with a reward function, this problem decomposition can be seamlessly accommodated within the standard reinforcement-learning formalism. The specific way we do so is through a generalization of two fundamental operations in reinforcement learning: policy improvement and policy evaluation. The generalized version of these operations allow one to leverage the solution of some tasks to speed up the solution of others. If the reward function of a task can be well approximated as a linear combination of the reward functions of tasks previously solved, we can reduce a reinforcement-learning problem to a simpler linear regression. When this is not the case, the agent can still exploit the task solutions by using them to interact with and learn about the environment. Both strategies considerably reduce the amount of data needed to solve a reinforcement-learning problem.

Keywords: artificial intelligence; generalized policy evaluation; generalized policy improvement; reinforcement learning; successor features

Year: 2020 PMID： 32817541 PMCID： PMC7720214 DOI： 10.1073/pnas.1907370117

Source DB: PubMed Journal: Proc Natl Acad Sci U S A ISSN： 0027-8424 Impact factor: 11.205

8 in total

Review 1. Optimality principles in sensorimotor control.

Authors: Emanuel Todorov
Journal: Nat Neurosci Date: 2004-09 Impact factor: 24.884

2. Mastering the game of Go with deep neural networks and tree search.

Authors: David Silver; Aja Huang; Chris J Maddison; Arthur Guez; Laurent Sifre; George van den Driessche; Julian Schrittwieser; Ioannis Antonoglou; Veda Panneershelvam; Marc Lanctot; Sander Dieleman; Dominik Grewe; John Nham; Nal Kalchbrenner; Ilya Sutskever; Timothy Lillicrap; Madeleine Leach; Koray Kavukcuoglu; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2016-01-28 Impact factor: 49.962

3. Online human training of a myoelectric prosthesis controller via actor-critic reinforcement learning.

Authors: Patrick M Pilarski; Michael R Dawson; Thomas Degris; Farbod Fahimi; Jason P Carey; Richard S Sutton
Journal: IEEE Int Conf Rehabil Robot Date: 2011

Review 4. A neural substrate of prediction and reward.

Authors: W Schultz; P Dayan; P R Montague
Journal: Science Date: 1997-03-14 Impact factor: 47.728

5. Computer science. Heads-up limit hold'em poker is solved.

Authors: Michael Bowling; Neil Burch; Michael Johanson; Oskari Tammelin
Journal: Science Date: 2015-01-09 Impact factor: 47.728

6. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play.

Authors: David Silver; Thomas Hubert; Julian Schrittwieser; Ioannis Antonoglou; Matthew Lai; Arthur Guez; Marc Lanctot; Laurent Sifre; Dharshan Kumaran; Thore Graepel; Timothy Lillicrap; Karen Simonyan; Demis Hassabis
Journal: Science Date: 2018-12-07 Impact factor: 47.728

7. Mastering the game of Go without human knowledge.

Authors: David Silver; Julian Schrittwieser; Karen Simonyan; Ioannis Antonoglou; Aja Huang; Arthur Guez; Thomas Hubert; Lucas Baker; Matthew Lai; Adrian Bolton; Yutian Chen; Timothy Lillicrap; Fan Hui; Laurent Sifre; George van den Driessche; Thore Graepel; Demis Hassabis
Journal: Nature Date: 2017-10-18 Impact factor: 49.962

8. Human-level control through deep reinforcement learning.

Authors: Volodymyr Mnih; Koray Kavukcuoglu; David Silver; Andrei A Rusu; Joel Veness; Marc G Bellemare; Alex Graves; Martin Riedmiller; Andreas K Fidjeland; Georg Ostrovski; Stig Petersen; Charles Beattie; Amir Sadik; Ioannis Antonoglou; Helen King; Dharshan Kumaran; Daan Wierstra; Shane Legg; Demis Hassabis
Journal: Nature Date: 2015-02-26 Impact factor: 49.962

8 in total

2 in total

1. The science of deep learning.

Authors: Richard Baraniuk; David Donoho; Matan Gavish
Journal: Proc Natl Acad Sci U S A Date: 2020-11-23 Impact factor: 11.205

2. A general model of hippocampal and dorsal striatal learning and decision making.

Authors: Jesse P Geerts; Fabian Chersi; Kimberly L Stachenfeld; Neil Burgess
Journal: Proc Natl Acad Sci U S A Date: 2020-11-23 Impact factor: 11.205

2 in total