Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Efficient sample reuse in policy gradients with parameter-based exploration.

Literature DB >> 23517103

Efficient sample reuse in policy gradients with parameter-based exploration.

Tingting Zhao¹, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama.

Abstract

The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.

Mesh：

Year: 2013 PMID： 23517103 DOI： 10.1162/NECO_a_00452

Source DB: PubMed Journal: Neural Comput ISSN： 0899-7667 Impact factor: 2.026

Keyword Cloud
Cited

1 in total

1. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors: Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2017-01-23 Impact factor: 2.650

1 in total