Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Analysis and improvement of policy gradient estimation.

Literature DB >> 22019189

Analysis and improvement of policy gradient estimation.

Tingting Zhao¹, Hirotaka Hachiya, Gang Niu, Masashi Sugiyama.

Abstract

Policy gradient is a useful model-free reinforcement learning approach, but it tends to suffer from instability of gradient estimates. In this paper, we analyze and improve the stability of policy gradient methods. We first prove that the variance of gradient estimates in the PGPE (policy gradients with parameter-based exploration) method is smaller than that of the classical REINFORCE method under a mild assumption. We then derive the optimal baseline for PGPE, which contributes to further reducing the variance. We also theoretically show that PGPE with the optimal baseline is more preferable than REINFORCE with the optimal baseline in terms of the variance of gradient estimates. Finally, we demonstrate the usefulness of the improved PGPE method through experiments.

Mesh：

Year: 2011 PMID： 22019189 DOI： 10.1016/j.neunet.2011.09.005

Source DB: PubMed Journal: Neural Netw ISSN： 0893-6080

Keyword Cloud
Cited

1 in total

1. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors: Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2017-01-23 Impact factor: 2.650

1 in total