Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Parameter-exploring policy gradients.

Literature DB >> 20061118

Parameter-exploring policy gradients.

Frank Sehnke¹, Christian Osendorfer, Thomas Rückstiess, Alex Graves, Jan Peters, Jürgen Schmidhuber.

Abstract

We present a model-free reinforcement learning method for partially observable Markov decision problems. Our method estimates a likelihood gradient by sampling directly in parameter space, which leads to lower variance gradient estimates than obtained by regular policy gradient methods. We show that for several complex control tasks, including robust standing with a humanoid robot, this method outperforms well-known algorithms from the fields of standard policy gradients, finite difference methods and population based heuristics. We also show that the improvement is largest when the parameter samples are drawn symmetrically. Lastly we analyse the importance of the individual components of our method by incrementally incorporating them into the other algorithms, and measuring the gain in performance after each step. 2009 Elsevier Ltd. All rights reserved.

Mesh：

Year: 2009 PMID： 20061118 DOI： 10.1016/j.neunet.2009.12.004

Source DB: PubMed Journal: Neural Netw ISSN： 0893-6080

Keyword Cloud
Cited

9 in total

1. Exploring Behaviors of Caterpillar-Like Soft Robots with a Central Pattern Generator-Based Controller and Reinforcement Learning.

Authors: Matthew Ishige; Takuya Umedachi; Tadahiro Taniguchi; Yoshihiro Kawahara
Journal: Soft Robot Date: 2019-05-20 Impact factor: 8.071

2. Learning of Sub-optimal Gait Controllers for Magnetic Walking Soft Millirobots.

Authors: Utku Culha; Sinan O Demir; Sebastian Trimpe; Metin Sitti
Journal: Robot Sci Syst Date: 2020

3. Task space adaptation via the learning of gait controllers of magnetic soft millirobots.

Authors: Sinan O Demir; Utku Culha; Alp C Karacakol; Abdon Pena-Francesch; Sebastian Trimpe; Metin Sitti
Journal: Int J Rob Res Date: 2021-06-16 Impact factor: 4.703

4. Learned graphical models for probabilistic planning provide a new class of movement primitives.

Authors: Elmar A Rückert; Gerhard Neumann; Marc Toussaint; Wolfgang Maass
Journal: Front Comput Neurosci Date: 2013-01-02 Impact factor: 2.380

5. Linear combination of one-step predictive information with an external reward in an episodic policy gradient setting: a critical analysis.

Authors: Keyan Zahedi; Georg Martius; Nihat Ay
Journal: Front Psychol Date: 2013-11-04

6. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors: Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2017-01-23 Impact factor: 2.650

7. PowerPlay: Training an Increasingly General Problem Solver by Continually Searching for the Simplest Still Unsolvable Problem.

Authors: Jürgen Schmidhuber
Journal: Front Psychol Date: 2013-06-07

8. Information driven self-organization of complex robotic behaviors.

Authors: Georg Martius; Ralf Der; Nihat Ay
Journal: PLoS One Date: 2013-05-27 Impact factor: 3.240

9. Learned parametrized dynamic movement primitives with shared synergies for controlling robotic and musculoskeletal systems.

Authors: Elmar Rückert; Andrea d'Avella
Journal: Front Comput Neurosci Date: 2013-10-17 Impact factor: 2.380

9 in total