Literature DB >> 20061118

Parameter-exploring policy gradients.

Frank Sehnke1, Christian Osendorfer, Thomas Rückstiess, Alex Graves, Jan Peters, Jürgen Schmidhuber.   

Abstract

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust standing with a humanoid robot, this method outperforms well-known algorithms from the fields of standard policy gradients, finite difference methods and population based heuristics. We also show that the improvement is largest when the parameter samples are drawn symmetrically. Lastly we analyse the importance of the individual components of our method by incrementally incorporating them into the other algorithms, and measuring the gain in performance after each step. 2009 Elsevier Ltd. All rights reserved.

Mesh:

Year:  2009        PMID: 20061118     DOI: 10.1016/j.neunet.2009.12.004

Source DB:  PubMed          Journal:  Neural Netw        ISSN: 0893-6080


  9 in total

1.  Exploring Behaviors of Caterpillar-Like Soft Robots with a Central Pattern Generator-Based Controller and Reinforcement Learning.

Authors:  Matthew Ishige; Takuya Umedachi; Tadahiro Taniguchi; Yoshihiro Kawahara
Journal:  Soft Robot       Date:  2019-05-20       Impact factor: 8.071

2.  Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots.

Authors:  Utku Culha; Sinan O Demir; Sebastian Trimpe; Metin Sitti
Journal:  Robot Sci Syst       Date:  2020

3.  Task space adaptation via the learning of gait controllers of magnetic soft millirobots.

Authors:  Sinan O Demir; Utku Culha; Alp C Karacakol; Abdon Pena-Francesch; Sebastian Trimpe; Metin Sitti
Journal:  Int J Rob Res       Date:  2021-06-16       Impact factor: 4.703

4.  Learned graphical models for probabilistic planning provide a new class of movement primitives.

Authors:  Elmar A Rückert; Gerhard Neumann; Marc Toussaint; Wolfgang Maass
Journal:  Front Comput Neurosci       Date:  2013-01-02       Impact factor: 2.380

5.  Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis.

Authors:  Keyan Zahedi; Georg Martius; Nihat Ay
Journal:  Front Psychol       Date:  2013-11-04

6.  Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors:  Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal:  Front Neurorobot       Date:  2017-01-23       Impact factor: 2.650

7.  PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem.

Authors:  Jürgen Schmidhuber
Journal:  Front Psychol       Date:  2013-06-07

8.  Information driven self-organization of complex robotic behaviors.

Authors:  Georg Martius; Ralf Der; Nihat Ay
Journal:  PLoS One       Date:  2013-05-27       Impact factor: 3.240

9.  Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems.

Authors:  Elmar Rückert; Andrea d'Avella
Journal:  Front Comput Neurosci       Date:  2013-10-17       Impact factor: 2.380

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.