Literature DB >> 21851281

Reward-weighted regression with sample reuse for direct policy search in reinforcement learning.

Hirotaka Hachiya1, Jan Peters, Masashi Sugiyama.   

Abstract

Direct policy search is a promising reinforcement learning framework, in particular for controlling continuous, high-dimensional systems. Policy search often requires a large number of samples for obtaining a stable policy update estimator, and this is prohibitive when the sampling cost is expensive. In this letter, we extend an expectation-maximization-based policy search method so that previously collected samples can be efficiently reused. The usefulness of the proposed method, reward-weighted regression with sample reuse (R3), is demonstrated through robot learning experiments. (This letter is an extended version of our earlier conference paper: Hachiya, Peters, & Sugiyama, 2009 .).

Mesh:

Year:  2011        PMID: 21851281     DOI: 10.1162/NECO_a_00199

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  1 in total

1.  Adaptive Baseline Enhances EM-Based Policy Search: Validation in a View-Based Positioning Task of a Smartphone Balancer.

Authors:  Jiexin Wang; Eiji Uchibe; Kenji Doya
Journal:  Front Neurorobot       Date:  2017-01-23       Impact factor: 2.650

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.