| Literature DB >> 23517103 |
Tingting Zhao1, Hirotaka Hachiya, Voot Tangkaratt, Jun Morimoto, Masashi Sugiyama.
Abstract
The policy gradient approach is a flexible and powerful reinforcement learning method particularly for problems with continuous actions such as robot control. A common challenge is how to reduce the variance of policy gradient estimates for reliable policy updates. In this letter, we combine the following three ideas and give a highly effective policy gradient method: (1) policy gradients with parameter-based exploration, a recently proposed policy search method with low variance of gradient estimates; (2) an importance sampling technique, which allows us to reuse previously gathered data in a consistent way; and (3) an optimal baseline, which minimizes the variance of gradient estimates with their unbiasedness being maintained. For the proposed method, we give a theoretical analysis of the variance of gradient estimates and show its usefulness through extensive experiments.Mesh:
Year: 2013 PMID: 23517103 DOI: 10.1162/NECO_a_00452
Source DB: PubMed Journal: Neural Comput ISSN: 0899-7667 Impact factor: 2.026