Literature DB >> 23517103

Efficient sample reuse in policy gradients with parameter-based exploration.

Tingting Zhao1, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama.   

Abstract

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.

Mesh:

Year:  2013        PMID: 23517103     DOI: 10.1162/NECO_a_00452

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  1 in total

1.  Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors:  Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal:  Front Neurorobot       Date:  2017-01-23       Impact factor: 2.650

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.