Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.

Literature DB >> 21851281

Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.

Hirotaka Hachiya¹, Jan Peters, Masashi Sugiyama.

Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R3), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .).

Mesh：

Year: 2011 PMID： 21851281 DOI： 10.1162/NECO_a_00199

Source DB: PubMed Journal: Neural Comput ISSN： 0899-7667 Impact factor: 2.026

Keyword Cloud
Cited

1 in total

1. Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors: Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal: Front Neurorobot Date: 2017-01-23 Impact factor: 2.650

1 in total